(Not too successfully) trying to use Unix tools instead of Python utility scripts
Inspired by articles such as
Why you should learn just a little Awk and
Learn one sed command, I am trying to make use of Unix tools
sed, awk, grep, cut, uniq, sort,
etc. instead of writing short Python utility scripts.
Here is a Python script I wrote this week. It greps a file for a given regular expression pattern and returns a unique, sorted, list of matches inside the capturing parentheses.
# grep2.py
import re
import sys
def main():
patt = sys.argv[1]
filename = sys.argv[2]
text = open(filename).read()
matchlist = set(m.group(1) for m in ...How to search C code for division or sqrt
The following Python script searches through C code for division or sqrt
and prints the line of code and the line number. It skips C comments.
To use, run python find_divides.py filename.c
#!/usr/bin/python
"""find_divides.py
usage: python find_divides.py filename
"""
import re
import sys
def main():
filename = sys.argv[1]
text = open(filename).read()
lines = text.splitlines()
lines = ["%4d: %s" % (i, line) for (i, line) in enumerate(lines)]
text = "\n".join(lines)
text = remove_comments_and_strings(text)
for line in text.splitlines():
if ("/" in line) or ("sqrt" in line):
print line
def remove_comments_and_strings(text):
""" remove c-style comments and ...How to remove C style comments using Python
The Perl FAQ has an entry How do I use a regular expression to strip C style comments from a file? Since I've switched to Python, I've adapted the Perl solution to Python. This regular expression was created by Jeffrey Friedl and later modified by Fred Curtis. I'm not certain, but it appears to use the "unrolling the loop" technique described in Chapter 6 of Mastering Regular Expressions.
remove_comments.py:
import re
import sys
def remove_comments(text):
""" remove c-style comments.
text: blob of text with comments (can include newlines)
returns: text with comments removed
"""
pattern = r"""
## --------- COMMENT ...Using Python's finditer to highlight search items
I am trying to search through various text and highlight certain search terms within that text using HTML markup. As an example, if I take a paragraph of text from Paul Prescod's essay, I would like to highlight the search terms "lisp", "python", "perl", "java", and "C" each in different colors. My first attempt at this problem looked somthing like:
for sentence in re.split(r"[?.]\s+", text):
match = re.search(r"\blisp\b", sentence, re.I)
if match:
color = 'red'
else:
match = re.search(r"\bpython\b", sentence, re.I)
if match:
color = 'blue'
else:
match = re.search ...Using Python's finditer for Lexical Analysis
Fredrik Lundh wrote a good article called Using Regular Expressions for Lexical Analysis which explains how to use Python regular expressions to read an input string and group characters into lexical units, or tokens. The author's first group of examples read in a simple expression, "b = 2 + a*10", and output strings classified as one of three token types: symbols (e.g. a and b), integer literals (e.g. 2 and 10), and operators (e.g. =, +, and *). His first three examples use the findall method and his fourth example uses the undocumented scanner method from the re module. Here ...
Python finditer regular expression example
I often process text line by line using
the splitlines() method with a for
loop. This works great most of the time, however, sometimes, the text
is not neatly divisible into lines, or, I need to match multiple items
per line. This is where the re
module's finditer function can
help. finditer returns an iterator over all
non-overlapping matches for the regular expression pattern in the
string. (See docs.)
It is a powerful tool for text processing and one that I don't use
often enough.
Here is a simple example which demonstrates the use
of finditer. It reads ...
About
I'm Eliot and this is my notepad for programming topics such as Python, Django, Ubuntu, Emacs, etc... more »
Search Blog
Tags
-
algorithms
(6)
-
android
(2)
-
aws
(10)
-
blogproject
(20)
-
c_cplusplus
(12)
-
cardstore
(8)
-
colinux
(2)
-
concurrency
(13)
-
conkeror
(2)
-
core
(2)
-
cygwin
(17)
-
datastructures
(15)
-
datetime
(4)
-
decorators
(4)
-
django
(41)
-
emacs
(22)
-
files_directories
(12)
-
git
(6)
-
hardware
(6)
-
install_setup
(8)
-
javascript
(3)
-
keyboard
(9)
-
matplotlib
(6)
-
mercurial
(4)
-
nginx
(2)
-
persistence
(6)
-
preferences
(7)
-
processes
(4)
-
pyqt
(18)
-
python
(157)
-
ratpoison
(3)
-
regexes
(6)
-
rsync
(3)
-
softwaretools
(17)
-
sql
(14)
-
ssh
(12)
-
subversion
(6)
-
twisted
(7)
-
ubuntu
(66)
-
urxvt
(5)
-
vxworks
(25)
-
webdev
(8)
-
wmii
(7)
Blogroll
- Adam Gomaa
- Alex Clemesha
- Amir Salihefendic
- Armin Ronacher
- David Beazley
- David Ziegler
- Duncan McGreggor
- Gareth Rushgrave
- Glyph Lefkowitz
- Guido van Rossum
- Ian Bicking
- Jacob Kaplan-Moss
- James Bennett
- James Tauber
- Jesper Noehr
- Marty Alchin
- Matt Harrison
- Nikolay Kolev
- Parand Darugar
- Peter Baumgartner
- Peter Bengtsson
- Rob Hudson
- Simon Willison
- Will McGugan