Options for listing the files in a directory with Python
I do a lot of sysadmin-type work with Python so I often need to list the contents of directory on a filesystem. Here are 4 methods I've used so far to do that. Let me know if you have any good alternatives. The examples were run on my Ubuntu Karmic machine.
OPTION 1 -
os.listdir()¶
This is probably the simplest way to list the contents of a directory in Python.
import os
dirlist = os.listdir("/usr")
from pprint import pprint
pprint(dirlist)
Results:
['lib', 'shareFeisty', 'src', 'bin', 'local', 'X11R6', 'lib64', 'sbin', 'share', 'include', 'lib32', 'man', 'games']
OPTION 2 -
glob.glob()¶
This method allows you to use shell-style wildcards.
import glob
dirlist = glob.glob('/usr/*')
from pprint import pprint
pprint(dirlist)
Results:
['/usr/lib', '/usr/shareFeisty', '/usr/src', '/usr/bin', '/usr/local', '/usr/X11R6', '/usr/lib64', '/usr/sbin', '/usr/share', '/usr/include', '/usr/lib32', '/usr/man', '/usr/games']
OPTION 3 - Unix "ls" command using
subprocess¶
This method uses your operating system's "ls" command. It allows you to sort
the output based on modification time, file size, etc. by passing these command-line
options to the "ls" command. The following example lists the 10 most recently
modified files in /var/log:
from subprocess import Popen, PIPE
def listdir_shell(path, *lsargs):
p = Popen(('ls', path) + lsargs, shell=False, stdout=PIPE, close_fds=True)
return [path.rstrip('\n') for path in p.stdout.readlines()]
dirlist = listdir_shell('/var/log', '-t')[:10]
from pprint import pprint
pprint(dirlist)
Results:
['auth.log', 'syslog', 'dpkg.log', 'messages', 'user.log', 'daemon.log', 'debug', 'kern.log', 'munin', 'mysql.log']
OPTION 4 - Unix "find" style using
os.walk¶
This method allows you to list directory contents recursively
in a manner similar to the Unix "find" command. It uses Python's
os.walk.
import os
def unix_find(pathin):
"""Return results similar to the Unix find command run without options
i.e. traverse a directory tree and return all the file paths
"""
return [os.path.join(path, file)
for (path, dirs, files) in os.walk(pathin)
for file in files]
pathlist = unix_find('/etc')[-10:]
from pprint import pprint
pprint(pathlist)
Results:
['/etc/fonts/conf.avail/20-lohit-gujarati.conf', '/etc/fonts/conf.avail/69-language-selector-zh-mo.conf', '/etc/fonts/conf.avail/11-lcd-filter-lcddefault.conf', '/etc/cron.weekly/0anacron', '/etc/cron.weekly/cvs', '/etc/cron.weekly/popularity-contest', '/etc/cron.weekly/man-db', '/etc/cron.weekly/apt-xapian-index', '/etc/cron.weekly/sysklogd', '/etc/cron.weekly/.placeholder']
Related posts
- How to get the filename and it's parent directory in Python — posted 2011-12-28
- How to remove ^M characters from a file with Python — posted 2011-10-03
- Monitoring a filesystem with Python and Pyinotify — posted 2010-04-09
- os.path.relpath() source code for Python 2.5 — posted 2010-03-31
- A hack to copy files between two remote hosts using Python — posted 2010-02-08
6
Comments
—
Comments feed for this post
#2 Eliot commented on 2010-05-18:
Keith: That's a good tip. I will give it a try the next time I get a chance. Thanks!
#3 Al Jaffe commented on 2010-06-14:
...and how about an easy way for listing contents of a WEB directory? Could any of the above techniques be used?
#4 Directory commented on 2010-11-23:
I'm just learning python for my job and this has been a really useful reference page for me!! I realise it's only really useful for one thing - but the methods you've shown are perfect for particular types of directory listings in my code ;).
#5 gsiliceo commented on 2011-04-16:
I recently started learning python and i love your blog i'm constantly looking for best practices and "solved" problems
#6 Eriksen commented on 2011-08-22:
I'm also just learning python for my job and this has been a really useful reference page for me.
I hope you can post more about system administration booth Unix and Windows.
Keep up the good work man ;)
Post a comment
About
I'm Eliot and this is my notepad for programming topics such as Python, Django, Ubuntu, Emacs, etc... more »
Search Blog
Tags
-
algorithms
(6)
-
aws
(10)
-
blogproject
(20)
-
c_cplusplus
(12)
-
cardstore
(8)
-
colinux
(2)
-
concurrency
(13)
-
conkeror
(2)
-
core
(2)
-
cygwin
(17)
-
datastructures
(14)
-
datetime
(4)
-
decorators
(4)
-
django
(40)
-
emacs
(22)
-
files_directories
(12)
-
git
(5)
-
hardware
(5)
-
install_setup
(8)
-
javascript
(3)
-
keyboard
(9)
-
matplotlib
(6)
-
mercurial
(4)
-
nginx
(2)
-
persistence
(6)
-
preferences
(7)
-
processes
(4)
-
pyqt
(18)
-
python
(146)
-
ratpoison
(3)
-
regexes
(6)
-
rsync
(3)
-
softwaretools
(17)
-
sql
(14)
-
ssh
(10)
-
subversion
(6)
-
twisted
(7)
-
ubuntu
(65)
-
urxvt
(5)
-
vxworks
(25)
-
webdev
(6)
-
wmii
(7)
Blogroll
- Adam Gomaa
- Alex Clemesha
- Amir Salihefendic
- Armin Ronacher
- David Beazley
- David Ziegler
- Duncan McGreggor
- Gareth Rushgrave
- Glyph Lefkowitz
- Guido van Rossum
- Ian Bicking
- Jacob Kaplan-Moss
- James Bennett
- James Tauber
- Jesper Noehr
- Marty Alchin
- Matt Harrison
- Nikolay Kolev
- Parand Darugar
- Peter Baumgartner
- Peter Bengtsson
- Rob Hudson
- Simon Willison
- Will McGugan
#1 Keith Beattie commented on 2010-05-17:
Adding a regexp to your option #1 is a quick way to get python's re module into play when sh regexps won't cut it:
gives me (on my FreeBSD box)