Frequency Counting in Python

21 Nov 2007

Let's say you need to keep track of frequency count of arbitrary items. For instance parsing a log file or doing a word count and seeing what terms come up most often.

Python lets you subclass the built in dict class (hash table). An extension to aid in frequency counting is listed below. It's nothing particular special, but it's simple, it works, and it's fast. Enjoy!

class dictcount(dict):

    def add(self, key, value=1):
        self[key] = self.get(key,0) + value

    def sum(self):
        return sum(self.itervalues())

    def sortByValue(self, reverse=True):
        return sorted(self.iteritems(),
                      key=lambda (k,v): (v,k),
                      reverse=reverse)

Sample run:

$ python
>>> d = dictcount()
>>> d.add('foo')
>>> d.add('foo')
>>> d.add('bar', 2)
>>> d.add('bar', 3)
>>> d.sum()
7
>>> d.sortByValue()
[('bar', 5), ('foo', 2)]

Of course all the regular dict methods are available too.