Saltycrane logo

SaltyCrane Blog

Notes on Python, Django, and web development on Ubuntu Linux

     Posts tagged "regexes"

(Not too successfully) trying to use Unix tools instead of Python utility scripts

Inspired by articles such as Why you should learn just a little Awk and Learn one sed command, I am trying to make use of Unix tools sed, awk, grep, cut, uniq, sort, etc. instead of writing short Python utility scripts.

Here is a Python script I wrote this week. It greps a file for a given regular expression pattern and returns a unique, sorted, list of matches inside the capturing parentheses.

# grep2.py

import re
import sys


def main():
    patt = sys.argv[1]
    filename = sys.argv[2]

    text = open(filename).read()
    matchlist = set(m.group(1) for m in ...
... read more »

How to search C code for division or sqrt

The following Python script searches through C code for division or sqrt and prints the line of code and the line number. It skips C comments. To use, run python find_divides.py filename.c

#!/usr/bin/python

"""find_divides.py

usage: python find_divides.py filename
"""

import re
import sys

def main():
    filename = sys.argv[1]
    text = open(filename).read()
    lines = text.splitlines()
    lines = ["%4d: %s" % (i, line) for (i, line) in enumerate(lines)]
    text = "\n".join(lines)
    text = remove_comments_and_strings(text)

    for line in text.splitlines():
        if ("/" in line) or ("sqrt" in line):
            print line

def remove_comments_and_strings(text):
    """ remove c-style comments and ...
... read more »

How to remove C style comments using Python

The Perl FAQ has an entry How do I use a regular expression to strip C style comments from a file? Since I've switched to Python, I've adapted the Perl solution to Python. This regular expression was created by Jeffrey Friedl and later modified by Fred Curtis. I'm not certain, but it appears to use the "unrolling the loop" technique described in Chapter 6 of Mastering Regular Expressions.

remove_comments.py:

import re
import sys

def remove_comments(text):
    """ remove c-style comments.
        text: blob of text with comments (can include newlines)
        returns: text with comments removed
    """
    pattern = r"""
                            ##  --------- COMMENT ...
... read more »

Using Python's finditer to highlight search items

I am trying to search through various text and highlight certain search terms within that text using HTML markup. As an example, if I take a paragraph of text from Paul Prescod's essay, I would like to highlight the search terms "lisp", "python", "perl", "java", and "C" each in different colors. My first attempt at this problem looked somthing like:

for sentence in re.split(r"[?.]\s+", text):
    match = re.search(r"\blisp\b", sentence, re.I)
    if match:
        color = 'red'
    else:
        match = re.search(r"\bpython\b", sentence, re.I)
        if match:
            color = 'blue'
        else:
            match = re.search ...
... read more »

Using Python's finditer for Lexical Analysis

Fredrik Lundh wrote a good article called Using Regular Expressions for Lexical Analysis which explains how to use Python regular expressions to read an input string and group characters into lexical units, or tokens. The author's first group of examples read in a simple expression, "b = 2 + a*10", and output strings classified as one of three token types: symbols (e.g. a and b), integer literals (e.g. 2 and 10), and operators (e.g. =, +, and *). His first three examples use the findall method and his fourth example uses the undocumented scanner method from the re module. Here ...

... read more »

Python finditer regular expression example

I often process text line by line using the splitlines() method with a for loop. This works great most of the time, however, sometimes, the text is not neatly divisible into lines, or, I need to match multiple items per line. This is where the re module's finditer function can help. finditer returns an iterator over all non-overlapping matches for the regular expression pattern in the string. (See docs.) It is a powerful tool for text processing and one that I don't use often enough.

Here is a simple example which demonstrates the use of finditer. It reads ...

... read more »
Created with Django | Hosted by Linode