Remove leading and trailing whitespace from a csv file with Python
I'm reading a csv file with the Python csv module and could not find a setting to remove trailing whitespace. I found this setting, Dialect.skipinitialspace, but it I think it only applies to leading whitespace. Here's a one-liner to delete leading and trailing whitespace that worked for me.
import csv
reader = csv.DictReader(
open('myfile.csv'),
fieldnames=('myfield1', 'myfield1', 'myfield3'),
)
# skip the header row
next(reader)
# remove leading and trailing whitespace from all values
reader = (
dict((k, v.strip()) for k, v in row.items()) for row in reader)
# print results
for row in reader:
print row
3
Comments
—
Comments feed for this post
#2 Eliot commented on 2011-10-21:
Tom:
It actually creates another generator so it won't load all values into memory at one time.
print type(reader)
gives:
<type 'generator'>
If you make it a list instead of a generator (change the parentheses to square brackets), it will load all values into memory at one time:
reader = [
dict((k, v.strip()) for k, v in row.items()) for row in reader]
print type(reader)
gives:
<type 'list'>
#3 Johannes commented on 2011-12-13:
Although not a python solution, you may be interested in looking into Google Refine for tasks like this. It makes cleaning up CSV data a breeze. It also has support for Jython so you can run Python expressions on the data.
Post a comment
About
I'm Eliot and this is my notepad for programming topics such as Python, Django, Ubuntu, Emacs, etc... more »
Search Blog
Tags
-
algorithms
(6)
-
aws
(10)
-
blogproject
(20)
-
c_cplusplus
(12)
-
cardstore
(8)
-
colinux
(2)
-
concurrency
(13)
-
conkeror
(2)
-
core
(2)
-
cygwin
(17)
-
datastructures
(14)
-
datetime
(4)
-
decorators
(4)
-
django
(40)
-
emacs
(22)
-
files_directories
(12)
-
git
(5)
-
hardware
(5)
-
install_setup
(8)
-
javascript
(3)
-
keyboard
(9)
-
matplotlib
(6)
-
mercurial
(4)
-
nginx
(2)
-
persistence
(6)
-
preferences
(7)
-
processes
(4)
-
pyqt
(18)
-
python
(145)
-
ratpoison
(3)
-
regexes
(6)
-
rsync
(3)
-
softwaretools
(17)
-
sql
(14)
-
ssh
(10)
-
subversion
(6)
-
twisted
(7)
-
ubuntu
(65)
-
urxvt
(5)
-
vxworks
(25)
-
webdev
(6)
-
wmii
(7)
Blogroll
- Adam Gomaa
- Alex Clemesha
- Amir Salihefendic
- Armin Ronacher
- David Beazley
- David Ziegler
- Duncan McGreggor
- Gareth Rushgrave
- Glyph Lefkowitz
- Guido van Rossum
- Ian Bicking
- Jacob Kaplan-Moss
- James Bennett
- James Tauber
- Jesper Noehr
- Marty Alchin
- Matt Harrison
- Nikolay Kolev
- Parand Darugar
- Peter Baumgartner
- Peter Bengtsson
- Rob Hudson
- Simon Willison
- Will McGugan
#1 Tom commented on 2011-10-21:
Wouldn't this load all the csv values into memory? And is this a practical solution for large data-sets?