I'm reading a csv file with the Python csv module and could not find a setting to remove trailing whitespace. I found this setting, Dialect.skipinitialspace, but it I think it only applies to leading whitespace. Here's a one-liner to delete leading and trailing whitespace that worked for me.
import csv reader = csv.DictReader( open('myfile.csv'), fieldnames=('myfield1', 'myfield1', 'myfield3'), ) # skip the header row next(reader) # remove leading and trailing whitespace from all values reader = ( dict((k, v.strip()) for k, v in row.items() if v) for row in reader) # print results for row in reader: print row
Wouldn't this load all the csv values into memory? And is this a practical solution for large data-sets?
It actually creates another generator so it won't load all values into memory at one time.
If you make it a list instead of a generator (change the parentheses to square brackets), it will load all values into memory at one time:
reader = [ dict((k, v.strip()) for k, v in row.items()) for row in reader] print type(reader)
Although not a python solution, you may be interested in looking into Google Refine for tasks like this. It makes cleaning up CSV data a breeze. It also has support for Jython so you can run Python expressions on the data.