Sorting Python Dictionary By Value, Take 2

09 Sep 2008

Here's why another reason why you should blog all your technical problems and solutions. Sometimes someone finds a better way.

In my original sorting a dict by value entry, I said the best way is:

sorted(adict.iteritems(), key=lambda (k,v): v)

Turns out I'm wrong. Gregg Lind (aka "write-only") replied to the article with a comment pointing to his performance notes. PEP 0265 has the "best answer" that is at least 2x faster:

from operator import itemgetter
sorted(d.iteritems(), key=itemgetter(1))

Thanks all!

My performance test is here:

#!/usr/bin/env python                                                           

import cProfile

def sbv0(adict,reverse=False):
    return sorted(adict.iteritems(), key=lambda (k,v): v, reverse=reverse)

from operator import itemgetter
def sbv6(d,reverse=False):
    return sorted(d.iteritems(), key=itemgetter(1), reverse=reverse)

imax= 10000
dmax = 500
D = dict(zip([str(i) for i in range(dmax)],range(dmax)))'for i in xrange(imax): sbv0(D, reverse=False)')'for i in xrange(imax): sbv6(D, reverse=False)')

Results are

Old Way:
5020002 function calls in 6.623 CPU seconds

New Way:
     20002 function calls in 3.920 CPU seconds

Comment 2008-09-10 by None

You have a small mistake that doesn't affect the result:
In sbv0 you are not sorting by value, but by value and then by key.

To sort by value you would write

key = lambda (k,v): v

This improves the result by a little bit, but still the itemgetter wins.

Comment 2008-09-10 by None

thanks lorg! updated with corrections. --nickg

Comment 2008-09-10 by None

I'm glad that this topic has created so much good discussion. There aren't a whole lot of people doing performance profiling on python code.