Inspired by articles such as
Why you should learn just a little Awk and
Learn one sed command, I am trying to make use of Unix tools
etc. instead of writing short Python utility scripts.
Here is a Python script I wrote this week. It greps a file for a given regular expression pattern and returns a unique, sorted, list of matches inside the capturing parentheses.
# grep2.py import re import sys def main(): patt = sys.argv filename = sys.argv text = open(filename).read() matchlist = set(m.group(1) for m in ...
The following Python script searches through C code for division or sqrt
and prints the line of code and the line number. It skips C comments.
To use, run
python find_divides.py filename.c
#!/usr/bin/python """find_divides.py usage: python find_divides.py filename """ import re import sys def main(): filename = sys.argv text = open(filename).read() lines = text.splitlines() lines = ["%4d: %s" % (i, line) for (i, line) in enumerate(lines)] text = "\n".join(lines) text = remove_comments_and_strings(text) for line in text.splitlines(): if ("/" in line) or ("sqrt" in line): print line def remove_comments_and_strings(text): """ remove c-style comments and ...
The Perl FAQ has an entry How do I use a regular expression to strip C style comments from a file? Since I've switched to Python, I've adapted the Perl solution to Python. This regular expression was created by Jeffrey Friedl and later modified by Fred Curtis. I'm not certain, but it appears to use the "unrolling the loop" technique described in Chapter 6 of Mastering Regular Expressions.
import re import sys def remove_comments(text): """ remove c-style comments. text: blob of text with comments (can include newlines) returns: text with comments removed """ pattern = r""" ## --------- COMMENT ...
I am trying to search through various text and highlight certain search terms within that text using HTML markup. As an example, if I take a paragraph of text from Paul Prescod's essay, I would like to highlight the search terms "lisp", "python", "perl", "java", and "C" each in different colors. My first attempt at this problem looked somthing like:
for sentence in re.split(r"[?.]\s+", text): match = re.search(r"\blisp\b", sentence, re.I) if match: color = 'red' else: match = re.search(r"\bpython\b", sentence, re.I) if match: color = 'blue' else: match = re.search ...
Fredrik Lundh wrote a good article called Using Regular Expressions for Lexical Analysis which explains how to use Python regular expressions to read an input string and group characters into lexical units, or tokens. The author's first group of examples read in a simple expression,
"b = 2 + a*10", and output strings classified as one of three token types: symbols (e.g.
b), integer literals (e.g.
10), and operators (e.g.
*). His first three examples use the
findall method and his fourth example uses the undocumented
scanner method from the
re module. Here ...
I often process text line by line using
splitlines() method with a
loop. This works great most of the time, however, sometimes, the text
is not neatly divisible into lines, or, I need to match multiple items
per line. This is where the
finditer function can
finditer returns an iterator over all
non-overlapping matches for the regular expression pattern in the
string. (See docs.)
It is a powerful tool for text processing and one that I don't use
Here is a simple example which demonstrates the use
finditer. It reads ...