Saltycrane logo

SaltyCrane Blog

Notes on Python, Django, and web development on Ubuntu Linux

    

Remove leading and trailing whitespace from a csv file with Python

I'm reading a csv file with the Python csv module and could not find a setting to remove trailing whitespace. I found this setting, Dialect.skipinitialspace, but it I think it only applies to leading whitespace. Here's a one-liner to delete leading and trailing whitespace that worked for me.

import csv


reader = csv.DictReader(
    open('myfile.csv'),
    fieldnames=('myfield1', 'myfield1', 'myfield3'),
)

# skip the header row
next(reader)

# remove leading and trailing whitespace from all values
reader = (
    dict((k, v.strip()) for k, v in row.items() if v) for row in reader)

# print results
for row in reader:
    print row

3 Comments — feed icon Comments feed for this post


#1 Tom commented on 2011-10-21:

Wouldn't this load all the csv values into memory? And is this a practical solution for large data-sets?


#2 Eliot commented on 2011-10-21:

Tom:
It actually creates another generator so it won't load all values into memory at one time.

print type(reader)

gives:

<type 'generator'>

If you make it a list instead of a generator (change the parentheses to square brackets), it will load all values into memory at one time:

reader = [
    dict((k, v.strip()) for k, v in row.items()) for row in reader]

print type(reader)

gives:

<type 'list'>

#3 Johannes commented on 2011-12-13:

Although not a python solution, you may be interested in looking into Google Refine for tasks like this. It makes cleaning up CSV data a breeze. It also has support for Jython so you can run Python expressions on the data.

Post a comment

Required
Required, but not displayed
Optional

Format using Markdown. (No HTML.)
  • Code blocks: prefix each line by at least 4 spaces or 1 tab (and a blank line before and after)
  • Code span: surround with backticks
  • Blockquotes: prefix lines to be quoted with >
  • Links: <URL>
  • Links w/ description: [description](URL)
Created with Django | Hosted by Linode