Django Blog Project #12: Adding Pygments syntax highlighting

I've finally added automatic code highlighting to my blog. It uses Pygments to do the syntax highlighting and Beautiful Soup to find all the <pre> blocks to highlight. I still write my blog posts in HTML, but now add a class attribute to my <pre> tags to specify the Pygments lexer to use. For example, for python code, I use:

Which turns into:
import this
def demo():

Which turns into:

import this
def demo():

I bought James Bennett's book, Practical Django Projects about a month ago and it has good information about creating a blog with Django. It also documented techniques for syntax highlighting which I used here. To summarize, I added a new attribute, called body_highlighted to my Post model. Then, I added a custom save() method which parses my original HTML with Beautiful Soup and highlights it with Pygments.

Model changes

Here is the relevant code in ~/src/django/myblogsite/myblogapp/

class Post(models.Model):
    # ...
    body = models.TextField()
    body_highlighted = models.TextField(editable=False, blank=True)

    def save(self):
        self.body_highlighted = self.highlight_code(self.body)
        super(Post, self).save()

    def highlight_code(self, html):
        soup = BeautifulSoup(html)
        preblocks = soup.findAll('pre')
        for pre in preblocks:
            if pre.has_key('class'):
                    code = ''.join([unicode(item) for item in pre.contents])
                    code = self.unescape_html(code)
                    lexer = lexers.get_lexer_by_name(pre['class'])
                    formatter = formatters.HtmlFormatter()
                    code_hl = highlight(code, lexer, formatter)
        return unicode(soup)

    def unescape_html(self, html):
        html = html.replace('&lt;', '<')
        html = html.replace('&gt;', '>')
        html = html.replace('&amp;', '&')
        return html

Update 2010-04-09: I added the unescape_html method so that I could highlight Python code with regular expression named groups. For example:

m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
With the new fix in place, I just need to escape the < and > characters with &lt; and &gt; and the syntax highlighting will display correctly. Before I made the fix, if I did not escape the characters, BeautifulSoup would add closing tags to what it thought was my malformed HTML. So instead of the above, it looked like this:
m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")</last_name></first_name>
If anyone knows of a better solution, please let me know.

Update the database
  • List the SQL commands Django would use:
    $ cd ~/src/django/myblogsite/
    $ python sqlall myblogapp
    CREATE TABLE "myblogapp_post" (
        "id" integer NOT NULL PRIMARY KEY,
        "author_id" integer NOT NULL REFERENCES "auth_user" ("id"),
        "title" varchar(200) NOT NULL,
        "slug" varchar(200) NOT NULL,
        "date_created" datetime NOT NULL,
        "date_modified" datetime NOT NULL,
        "tags" varchar(200) NOT NULL,
        "body" text NOT NULL,
        "body_highlighted" text NOT NULL,
    CREATE INDEX "myblogapp_post_author_id" ON "myblogapp_post" ("author_id");
    CREATE INDEX "myblogapp_post_slug" ON "myblogapp_post" ("slug");
  • Enter the sqlite3 shell:
    $ sqlite3 mydatabase.sqlite3

    and enter the following statements:
    sqlite> ALTER TABLE myblogapp_post ADD COLUMN body_highlighted text;
    sqlite> .exit
Update the template

Here is the relevant code in ~/src/django/myblogsite/templates/singlepost.html:

    {% if post.body_highlighted %}
      {{ post.body_highlighted|safe }}
    {% else %}
      {{ post.body|safe }}
    {% endif %}
Add CSS for Pygments

One last step is to add the CSS for Pygments. Here is an excerpt from my ~/src/django/myblogsite/media/css/mystyle.css:

All pau. Now we should have pretty syntax highlighted code! (For those keeping track, this is now version 0.1.3 of my blog.)


#1 mapleoin commented on :

This post was excellent. Thanks!

There's one problem in the view (mentioned in the django docs here. I found that modifying the save method to look like this works:

def save(self, force_insert=False, force_update=False):
    self.body_highlighted = self.highlight_code(self.body)
    super(Post, self).save()

#2 Kenny Meyer commented on :

Thanks for this great post. I'm trying to implement Syntax Highlighting with pygments and you have provided a good base. I have a very similar set-up, though I guess this is written for an older Django version, because you're still using the unicode() wrapper.

Keep it up!