SaltyCrane Blog — Notes on JavaScript and web development

Remove leading and trailing whitespace from a csv file with Python

I'm reading a csv file with the Python csv module and could not find a setting to remove trailing whitespace. I found this setting, Dialect.skipinitialspace, but it I think it only applies to leading whitespace. Here's a one-liner to delete leading and trailing whitespace that worked for me.

import csv


reader = csv.DictReader(
    open('myfile.csv'),
    fieldnames=('myfield1', 'myfield1', 'myfield3'),
)

# skip the header row
next(reader)

# remove leading and trailing whitespace from all values
reader = (
    dict((k, v.strip()) for k, v in row.items() if v) for row in reader)

# print results
for row in reader:
    print row

Comments


#1 Tom commented on :

Wouldn't this load all the csv values into memory? And is this a practical solution for large data-sets?


#2 Eliot commented on :

Tom:
It actually creates another generator so it won't load all values into memory at one time.

print type(reader)

gives:

<type 'generator'>

If you make it a list instead of a generator (change the parentheses to square brackets), it will load all values into memory at one time:

reader = [
    dict((k, v.strip()) for k, v in row.items()) for row in reader]

print type(reader)

gives:

<type 'list'>

#3 Johannes commented on :

Although not a python solution, you may be interested in looking into Google Refine for tasks like this. It makes cleaning up CSV data a breeze. It also has support for Jython so you can run Python expressions on the data.