SaltyCrane Blog — Notes on JavaScript and web development

Django Blog Project #12: Adding Pygments syntax highlighting

I've finally added automatic code highlighting to my blog. It uses Pygments to do the syntax highlighting and Beautiful Soup to find all the <pre> blocks to highlight. I still write my blog posts in HTML, but now add a class attribute to my <pre> tags to specify the Pygments lexer to use. For example, for python code, I use:

<pre class="python">
import this
def demo():
    pass</pre>

Which turns into:

import this
def demo():
    pass

I bought James Bennett's book, Practical Django Projects about a month ago and it has good information about creating a blog with Django. It also documented techniques for syntax highlighting which I used here. To summarize, I added a new attribute, called body_highlighted to my Post model. Then, I added a custom save() method which parses my original HTML with Beautiful Soup and highlights it with Pygments.

Model changes

Here is the relevant code in ~/src/django/myblogsite/myblogapp/models.py:

class Post(models.Model):
    # ...
    body = models.TextField()
    body_highlighted = models.TextField(editable=False, blank=True)

    def save(self):
        self.body_highlighted = self.highlight_code(self.body)
        super(Post, self).save()

    def highlight_code(self, html):
        soup = BeautifulSoup(html)
        preblocks = soup.findAll('pre')
        for pre in preblocks:
            if pre.has_key('class'):
                try:
                    code = ''.join([unicode(item) for item in pre.contents])
                    code = self.unescape_html(code)
                    lexer = lexers.get_lexer_by_name(pre['class'])
                    formatter = formatters.HtmlFormatter()
                    code_hl = highlight(code, lexer, formatter)
                    pre.replaceWith(BeautifulSoup(code_hl))
                except:
                    pass
        return unicode(soup)

    def unescape_html(self, html):
        html = html.replace('&lt;', '<')
        html = html.replace('&gt;', '>')
        html = html.replace('&amp;', '&')
        return html

Update 2010-04-09: I added the unescape_html method so that I could highlight Python code with regular expression named groups. For example:

m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
With the new fix in place, I just need to escape the < and > characters with &lt; and &gt; and the syntax highlighting will display correctly. Before I made the fix, if I did not escape the characters, BeautifulSoup would add closing tags to what it thought was my malformed HTML. So instead of the above, it looked like this:
m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")</last_name></first_name>
If anyone knows of a better solution, please let me know.

Update the database
  • List the SQL commands Django would use:
    $ cd ~/src/django/myblogsite/
    $ python manage.py sqlall myblogapp
    
    BEGIN;
    CREATE TABLE "myblogapp_post" (
        "id" integer NOT NULL PRIMARY KEY,
        "author_id" integer NOT NULL REFERENCES "auth_user" ("id"),
        "title" varchar(200) NOT NULL,
        "slug" varchar(200) NOT NULL,
        "date_created" datetime NOT NULL,
        "date_modified" datetime NOT NULL,
        "tags" varchar(200) NOT NULL,
        "body" text NOT NULL,
        "body_highlighted" text NOT NULL,
    )
    ;
    CREATE INDEX "myblogapp_post_author_id" ON "myblogapp_post" ("author_id");
    CREATE INDEX "myblogapp_post_slug" ON "myblogapp_post" ("slug");
    COMMIT;
  • Enter the sqlite3 shell:
    $ sqlite3 mydatabase.sqlite3
    

    and enter the following statements:
    sqlite> ALTER TABLE myblogapp_post ADD COLUMN body_highlighted text;
    sqlite> .exit
Update the template

Here is the relevant code in ~/src/django/myblogsite/templates/singlepost.html:

    {% if post.body_highlighted %}
      {{ post.body_highlighted|safe }}
    {% else %}
      {{ post.body|safe }}
    {% endif %}
Add CSS for Pygments

One last step is to add the CSS for Pygments. Here is an excerpt from my ~/src/django/myblogsite/media/css/mystyle.css:

/* PYGMENTS STYLE */
/* customized */
.c  { color: #008040; font-style: italic } /* Comment */
.cm { color: #008040; font-style: italic } /* Comment.Multiline */
.cp { color: #BC7A00 } /* Comment.Preproc */
.c1 { color: #008040; font-style: italic } /* Comment.Single */
.cs { color: #008040; font-style: italic } /* Comment.Special */
.gd { color: grey; text-decoration: line-through } /* Generic.Deleted */
.gi { color: red; } /* Generic.Inserted */
.k  { color: #000080; font-weight: bold } /* Keyword */
.kc { color: #000000; font-weight: bold } /* Keyword.Constant */
.kd { color: #000000; font-weight: bold } /* Keyword.Declaration */
.kp { color: #000000 } /* Keyword.Pseudo */
.kr { color: #000000; font-weight: bold } /* Keyword.Reserved */
.kt { color: #000000; font-weight: bold } /* Keyword.Type */

/* original settings */
.err { border: 1px solid #FF0000 } /* Error */
.o { color: #666666 } /* Operator */
.ge { font-style: italic } /* Generic.Emph */
.gr { color: #FF0000 } /* Generic.Error */
.gh { color: #000080; font-weight: bold } /* Generic.Heading */
.go { color: #808080 } /* Generic.Output */
.gp { color: #000080; font-weight: bold } /* Generic.Prompt */
.gs { font-weight: bold } /* Generic.Strong */
.gu { color: #800080; font-weight: bold } /* Generic.Subheading */
.gt { color: #0040D0 } /* Generic.Traceback */
.m { color: #666666 } /* Literal.Number */
.s { color: #BA2121 } /* Literal.String */
.na { color: #7D9029 } /* Name.Attribute */
.nb { color: #008000 } /* Name.Builtin */
.nc { color: #0000FF; font-weight: bold } /* Name.Class */
.no { color: #880000 } /* Name.Constant */
.nd { color: #AA22FF } /* Name.Decorator */
.ni { color: #999999; font-weight: bold } /* Name.Entity */
.ne { color: #D2413A; font-weight: bold } /* Name.Exception */
.nf { color: #0000FF } /* Name.Function */
.nl { color: #A0A000 } /* Name.Label */
.nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
.nt { color: #008000; font-weight: bold } /* Name.Tag */
.nv { color: #19177C } /* Name.Variable */
.ow { color: #AA22FF; font-weight: bold } /* Operator.Word */
.w { color: #bbbbbb } /* Text.Whitespace */
.mf { color: #666666 } /* Literal.Number.Float */
.mh { color: #666666 } /* Literal.Number.Hex */
.mi { color: #666666 } /* Literal.Number.Integer */
.mo { color: #666666 } /* Literal.Number.Oct */
.sb { color: #BA2121 } /* Literal.String.Backtick */
.sc { color: #BA2121 } /* Literal.String.Char */
.sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */
.s2 { color: #BA2121 } /* Literal.String.Double */
.se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
.sh { color: #BA2121 } /* Literal.String.Heredoc */
.si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */
.sx { color: #008000 } /* Literal.String.Other */
.sr { color: #BB6688 } /* Literal.String.Regex */
.s1 { color: #BA2121 } /* Literal.String.Single */
.ss { color: #19177C } /* Literal.String.Symbol */
.bp { color: #008000 } /* Name.Builtin.Pseudo */
.vc { color: #19177C } /* Name.Variable.Class */
.vg { color: #19177C } /* Name.Variable.Global */
.vi { color: #19177C } /* Name.Variable.Instance */
.il { color: #666666 } /* Literal.Number.Integer.Long */

All pau. Now we should have pretty syntax highlighted code! (For those keeping track, this is now version 0.1.3 of my blog.)

Comments


#1 mapleoin commented on :

This post was excellent. Thanks!

There's one problem in the view (mentioned in the django docs here. I found that modifying the save method to look like this works:

def save(self, force_insert=False, force_update=False):
    self.body_highlighted = self.highlight_code(self.body)
    super(Post, self).save()

#2 Kenny Meyer commented on :

Thanks for this great post. I'm trying to implement Syntax Highlighting with pygments and you have provided a good base. I have a very similar set-up, though I guess this is written for an older Django version, because you're still using the unicode() wrapper.

Keep it up!