Sorting a List of Dictionaries in Python
Sorting a list of dictionaries by the values of some key is something that comes up pretty frequently for me. Basically, you have a list of dictionaries that you want sorted by some particular key. So if we have
[{'name': 'Bert', 'age':24}, {'name': 'Adam', 'age': 27}, {'name': 'Claire', 'age': 25}]
and we wanted to sort by age, we would get
[{'name': 'Bert', 'age':24}, {'name': 'Claire', 'age': 25}, {'name': 'Adam', 'age': 27}]
Lots of people ask this question and there’s a ton of different solutions out there. I had some free time so I decided to test them out.
Setup
from random import random d = [dict([(j,random()) for j in xrange(10)]) for i in xrange(500000)]
This generates an unsorted list of 500000 dictionaries, with each dictionary having 10 key/value pairs. The value for each key is drawn from a uniform distribution.
Python has two builtin methods for sorting lists. If you don’t need the original list, use list.sort() since it’s slightly more efficient. If you want to keep the original list, use sorted(list). I ran the following experiments using both.
Method 1
def compare_by(fieldname):
def compare_two_dicts (a, b):
return cmp(a[fieldname], b[fieldname])
return compare_two_dicts
d.sort(compare_by(KEY))
>>> 8.720021 seconds
d = sorted(d,compare_by(KEY))
>>> 9.655032 seconds
I found this method here and in terms of performance and elegance, comes up as the worst of the ones I surveyed.
Method 2
d.sort(lambda x,y : cmp(x[KEY], y[KEY])) >>> 8.667607 seconds d = sorted(d, lambda x,y : cmp(x[KEY], y[KEY])) >>> 9.260699 seconds
This is a little more efficient because it removes some unnecessary function calls, but we can do much better.
Method 3
d.sort(key=lambda k:k[KEY]) >>> 1.803021 seconds d = sorted(d,key=lambda k:k[KEY]) >>> 1.810326 seconds
This uses the key parameter that was built into in the sort functions starting with python 2.4, which kind of excuses the previous methods because I believe they were written prior to 2.4.
Method 4
from operator import itemgetter d.sort(key=itemgetter(KEY)) >>> 1.673376 seconds d = sorted(d,key=itemgetter(KEY)) >>> 1.656095 seconds
Using itemgetter is faster than lambda because it’s written in C, but I personally find the lambda method a little easier to grok. If speed is your thing though, use itemgetter.
You can find the code I used to run these experiments here: http://filer.case.edu/~dez4/sort.py