An Inside look at Python Counter

What we learn in school is just a tiny fraction of what is available to us as programmers. My goal in this series of blog posts is to introduce you to some of the more advance topics of several different languages. In this post I will talk specifically about the python collections Counter class.

The other day I was writing some code to analyze a dataset. The data contained a dictionary of records representing bit.ly link data. A record looked like the following:

{u’a’: u’Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.78 Safari/535.11′,

u’al’: u’en-US,en;q=0.8′,

u’c’: u’US’,

u’cy’: u’Danvers’,

u’g’: u’A6qOVH’,

u’gr’: u’MA’,

u’h’: u’wfLQtf’,

u’hc’: 1331822918,

u’hh’: u’1.usa.gov’,

u’l’: u’orofrog’,

u’ll’: [42.576698, -70.954903],

u’nk’: 1,

u’r’: u’http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/wfLQtf’,

u’t’: 1331923247,

u’tz’: u’America/New_York’,

u’u’: u’http://www.ncbi.nlm.nih.gov/pubmed/22415991′}

The letter immediately following the u is the key. So this line, u’tz’: u’America/New_York’, specifies that the time zone for this particular record is ‘America/New_York’. Most of the records contained the ‘tz’ field and a corresponding time zone value. I was attempting to list the top 10 time zones that appeared in the data. The approach you learned in school would involve looping through the data and collecting a count of each. Let’s say we had extracted a list of our time zones that looked like the following:

In [42]: time_zones[:10]

Out[42]:

[u’America/New_York’,

u’America/Denver’,

u’America/New_York’,

u’America/Sao_Paulo’,

u’America/New_York’,

u’America/New_York’,

u’Europe/Warsaw’,

u”,

u”,

u”]

Then we could do something like this:

def get_counts(items):

counts = defaultdict(int)

for x in sequence:

counts[x] += 1

return counts

Here, I’m creating a dictionary of int values. A dictionary is basically an object that stores values by key. Next, I’m iterating through the sequence and every time I encounter a new value I increment the value corresponding to that particular x by 1. So let’s say I had the following list.

student_grades = [‘A’, ‘A’, ‘B’, ‘B’, ‘B’, ‘B’, ‘C’, ‘F’]

The code would begin by reiterating over the list and x would become ‘B’. The counts[x] += 1 line of code would look in the dictionary for a key called x and then increment it’s value by 1. Since this is the first time we have seen an ‘A’, the value for ‘A’ in the dictionary would be 1. Now the loop continues and x becomes the second ‘A’. When the counts[x] += 1 line is encountered, the value for ‘A’ is already 1 so it will become 2. The x value would now become ‘B’, and since there is no ‘B’ in counts, it becomes 1, then 2, and so on.

I would use the code like this:

cnts = get_counts(student_grades)

I can then get the count for a particular time zone like this:

cnts[‘B’]

Which would return 4.

If I want to get the top ten I could do this:

def top_counts(count_dict, n=10):

#n = 10 gives me a default value for 10 if the caller does not specify one.

value_key_pairs = [(count, tx) for tx, count in count_dict.items()]

value_key_pairs.sort()

return value_key_pairs[-n]

Python Counter list comprehension

This code begins by using a feature of the python language called a list comprehension. I’ll do a separate blog on those later. We’re extracting the keys and values from the dictionary and them storing them in a list. Next we sort that list and then we use a slice to count backwards from the end of the list, returning n items from the end.

That codes not so bad but we can save ourselves a few keystrokes. In the same library as defaultdict, collections, there exists a class called Counter.

We could shorten all that code to this:

from collections import Counter

counts – Counter(time_zones)

counts.most_common(10)

Giving us:

In [43]: counts.most_common(10)

Out[43]:

[(u’America/New_York’, 1251),

(u”, 521),

(u’America/Chicago’, 400),

(u’America/Los_Angeles’, 382),

(u’America/Denver’, 191),

(u’Europe/London’, 74),

(u’Asia/Tokyo’, 37),

(u’Pacific/Honolulu’, 36),

(u’Europe/Madrid’, 35),

(u’America/Sao_Paulo’, 33)]

Whew, that was so much easier. Always explore your language to see if there is something that will solve your task easier. Don’t reinvent the wheel! If you want to find out more about the collection library visit https://docs.python.org/2/library/collections.html.

2 thoughts on “An Inside look at Python Counter

  1. Yes, we offer two python courses. We have an introduction to computer programming course utilizing python and intermediate python. There is no prerequisite on the introductory course.

Leave a Reply

Your email address will not be published. Required fields are marked *