SaltyCrane Blog — Notes on JavaScript and web development

How to sort a table by columns in Python

I have a 2-dimensional table of data implemented as a list of lists in Python. I would like to sort the data by an arbitrary column. This is a common task with tabular data. For example, Windows Explorer allows me to sort the list of files by Name, Size, Type, or Date Modified. I tried the code from this article, however, if there are duplicate entries in the column being sorted, the duplicates are removed. This is not what I wanted, so I did some further searching, and found a nice solution from the HowTo/Sorting article on the PythonInfo Wiki. This method also uses the built-in sorted() function, as well as the key paramenter, and operator.itemgetter(). (See section 2.1 and 6.7 of the Python Library Reference for more information.) The following code sorts the table by the second column (index 1). Note, Python 2.4 or later is required.

import operator

def sort_table(table, col=0):
    return sorted(table, key=operator.itemgetter(col))

if __name__ == '__main__':
    mytable = (
        ('Joe', 'Clark', '1989'),
        ('Charlie', 'Babbitt', '1988'),
        ('Frank', 'Abagnale', '2002'),
        ('Bill', 'Clark', '2009'),
        ('Alan', 'Clark', '1804'),
        )
    for row in sort_table(mytable, 1):
        print row

Results:
('Frank', 'Abagnale', '2002')
('Charlie', 'Babbitt', '1988')
('Joe', 'Clark', '1989')
('Bill', 'Clark', '2009')
('Alan', 'Clark', '1804')

This works well, but I would also like the table to be sorted by column 0 in addition to column 1. In this example, column 1 holds the Last Name and column 0 holds the First Name. I would like the table to be sorted first by Last Name, and then by First Name. Here is the code to sort the table by multiple columns. The cols argument is a tuple specifying the columns to sort by. The first column to sort by is listed first, the second second, and so on.
import operator

def sort_table(table, cols):
    """ sort a table by multiple columns
        table: a list of lists (or tuple of tuples) where each inner list 
               represents a row
        cols:  a list (or tuple) specifying the column numbers to sort by
               e.g. (1,0) would sort by column 1, then by column 0
    """
    for col in reversed(cols):
        table = sorted(table, key=operator.itemgetter(col))
    return table

if __name__ == '__main__':
    mytable = (
        ('Joe', 'Clark', '1989'),
        ('Charlie', 'Babbitt', '1988'),
        ('Frank', 'Abagnale', '2002'),
        ('Bill', 'Clark', '2009'),
        ('Alan', 'Clark', '1804'),
        )
    for row in sort_table(mytable, (1,0)):
        print row

Results:
('Frank', 'Abagnale', '2002')
('Charlie', 'Babbitt', '1988')
('Alan', 'Clark', '1804')
('Bill', 'Clark', '2009')
('Joe', 'Clark', '1989')

Comments


#1 Anonymous commented on :

Very Helpful.
Thanks.


#2 Ric. :-) commented on :

Thanks a lot!! :-) It's that I was looking for! :-)

Ric.


#3 Zack commented on :

Awesome. Thanks!


#4 Yar commented on :

Really useful ! Thanks !


#5 Carlo commented on :

Just what I was after. Thanks.


#6 Orangeboy commented on :

Thanks! New to python, and had trouble with complex sorting. This does the trick!


#7 Senyai commented on :

Nice, works as fast as sorted(table, cmp=lambda a, b:cmp((a[1],a[0]), (b[1],b[0])))


#8 Will commented on :

Thank you for this! Very simple and straightforward, that's what I needed!


#9 mana commented on :

http://wiki.python.org/moin/HowTo/Sorting


#10 lampard8 commented on :

why do you have to use "for col in reversed(cols)", instead of "for col in range(cols)"?? tks


#11 lampard8 commented on :

sorry i mean why do you have to "reverse" the col numbers instead of just passing the normal order?