SaltyCrane Blog — Notes on JavaScript and web development

Notes on parallel processing with Python and Twisted

Twisted is a networking engine written in Python, that among many other things, can be used to do parallel processing. It is very big, though, so I had a hard time finding what I needed. I browsed through the Twisted Documentation and the Twisted O'Reilly book. There is also a Recipe in the Python Cookbook. However, I found Bruce Eckel's article, Concurrency with Python, Twisted, and Flex to be the most helpful. (See also Bruce Eckel's initial article on Twisted: Grokking Twisted)

Here are my notes on running Bruce Eckel's example. I removed the Flex part because I didn't need or know anything about it. This example runs a Controller which starts a number of separate parallel processes running Solvers (a.ka. workers). It also allows for communication between the Controller and Solvers. Though this example only runs on one machine, the article said extending this to multiple machines is not difficult. For a good explanation of how this works, please see the original article.

Here is solver.py which is copied from the original article. The actual "work" is done in the step method. I only added some debugging print statements for myself.

"""
solver.py
Original version by Bruce Eckel
Solves one portion of a problem, in a separate process on a separate CPU
"""
import sys, random, math
from twisted.spread import pb
from twisted.internet import reactor

class Solver(pb.Root):

    def __init__(self, id):
        print "solver.py %s: solver init" % id
        self.id = id

    def __str__(self): # String representation
        return "Solver %s" % self.id

    def remote_initialize(self, initArg):
        return "%s initialized" % self

    def step(self, arg):
        print "solver.py %s: solver step" % self.id
        "Simulate work and return result"
        result = 0
        for i in range(random.randint(1000000, 3000000)):
            angle = math.radians(random.randint(0, 45))
            result += math.tanh(angle)/math.cosh(angle)
        return "%s, %s, result: %.2f" % (self, str(arg), result)

    # Alias methods, for demonstration version:
    remote_step1 = step
    remote_step2 = step
    remote_step3 = step

    def remote_status(self):
        print "solver.py %s: remote_status" % self.id
        return "%s operational" % self

    def remote_terminate(self):
        print "solver.py %s: remote_terminate" % self.id
        reactor.callLater(0.5, reactor.stop)
        return "%s terminating..." % self

if __name__ == "__main__":
    port = int(sys.argv[1])
    reactor.listenTCP(port, pb.PBServerFactory(Solver(sys.argv[1])))
    reactor.run()

Here is controller.py. This is also copied from the original article but I removed the Flex interface and created calls to start and terminate in the Controller class. I'm not sure if this makes sense, but at least this allowed me to run the example. I also moved the terminate method from the FlexInterface to the Controller.

"""
Controller.py
Original version by Bruce Eckel
Starts and manages solvers in separate processes for parallel processing.
"""
import sys
from subprocess import Popen
from twisted.spread import pb
from twisted.internet import reactor, defer

START_PORT = 5566
MAX_PROCESSES = 2

class Controller(object):

    def broadcastCommand(self, remoteMethodName, arguments, nextStep, failureMessage):
        print "controller.py: broadcasting..."
        deferreds = [solver.callRemote(remoteMethodName, arguments) 
                     for solver in self.solvers.values()]
        print "controller.py: broadcasted"
        reactor.callLater(3, self.checkStatus)

        defer.DeferredList(deferreds, consumeErrors=True).addCallbacks(
            nextStep, self.failed, errbackArgs=(failureMessage))
    
    def checkStatus(self):
        print "controller.py: checkStatus"
        for solver in self.solvers.values():
            solver.callRemote("status").addCallbacks(
                lambda r: sys.stdout.write(r + "\n"), self.failed, 
                errbackArgs=("Status Check Failed"))
                                                     
    def failed(self, results, failureMessage="Call Failed"):
        print "controller.py: failed"
        for (success, returnValue), (address, port) in zip(results, self.solvers):
            if not success:
                raise Exception("address: %s port: %d %s" % (address, port, failureMessage))

    def __init__(self):
        print "controller.py: init"
        self.solvers = dict.fromkeys(
            [("localhost", i) for i in range(START_PORT, START_PORT+MAX_PROCESSES)])
        self.pids = [Popen(["python", "solver.py", str(port)]).pid
                     for ip, port in self.solvers]
        print "PIDS: ", self.pids
        self.connected = False
        reactor.callLater(1, self.connect)

    def connect(self):
        print "controller.py: connect"
        connections = []
        for address, port in self.solvers:
            factory = pb.PBClientFactory()
            reactor.connectTCP(address, port, factory)
            connections.append(factory.getRootObject())
        defer.DeferredList(connections, consumeErrors=True).addCallbacks(
            self.storeConnections, self.failed, errbackArgs=("Failed to Connect"))

        print "controller.py: starting parallel jobs"
        self.start()

    def storeConnections(self, results):
        print "controller.py: storeconnections"
        for (success, solver), (address, port) in zip(results, self.solvers):
            self.solvers[address, port] = solver
        print "controller.py: Connected; self.solvers:", self.solvers
        self.connected = True

    def start(self):
        "controller.py: Begin the solving process"
        if not self.connected:
            return reactor.callLater(0.5, self.start)
        self.broadcastCommand("step1", ("step 1"), self.step2, "Failed Step 1")

    def step2(self, results):
        print "controller.py: step 1 results:", results
        self.broadcastCommand("step2", ("step 2"), self.step3, "Failed Step 2")

    def step3(self, results):
        print "controller.py: step 2 results:", results
        self.broadcastCommand("step3", ("step 3"), self.collectResults, "Failed Step 3")

    def collectResults(self, results):
        print "controller.py: step 3 results:", results
        self.terminate()
        
    def terminate(self):
        print "controller.py: terminate"
        for solver in self.solvers.values():
            solver.callRemote("terminate").addErrback(self.failed, "Termination Failed")
        reactor.callLater(1, reactor.stop)
        return "Terminating remote solvers"

if __name__ == "__main__":
    controller = Controller()
    reactor.run()

To run it, put the two files in the same directory and run python controller.py. You should see 2 CPUs (if you have 2) go up to 100% usage. And here is the screen output:

controller.py: init
PIDS:  [12173, 12174]
solver.py 5567: solver init
solver.py 5566: solver init
controller.py: connect
controller.py: starting parallel jobs
controller.py: storeconnections
controller.py: Connected; self.solvers: {('localhost', 5567): , ('localhost', 5566): }
controller.py: broadcasting...
controller.py: broadcasted
solver.py 5566: solver step
solver.py 5567: solver step
controller.py: checkStatus
solver.py 5566: remote_status
Solver 5566 operational
solver.py 5567: remote_status
controller.py: step 1 results: [(True, 'Solver 5567, step 1, result: 683825.75'), (True, 'Solver 5566, step 1, result: 543177.17')]
controller.py: broadcasting...
controller.py: broadcasted
Solver 5567 operational
solver.py 5566: solver step
solver.py 5567: solver step
controller.py: checkStatus
solver.py 5566: remote_status
Solver 5566 operational
solver.py 5567: remote_status
controller.py: step 2 results: [(True, 'Solver 5567, step 2, result: 636793.90'), (True, 'Solver 5566, step 2, result: 335358.16')]
controller.py: broadcasting...
controller.py: broadcasted
Solver 5567 operational
solver.py 5566: solver step
solver.py 5567: solver step
controller.py: checkStatus
solver.py 5566: remote_status
Solver 5566 operational
solver.py 5567: remote_status
controller.py: step 3 results: [(True, 'Solver 5567, step 3, result: 847386.43'), (True, 'Solver 5566, step 3, result: 512120.15')]
controller.py: terminate
Solver 5567 operational
solver.py 5566: remote_terminate
solver.py 5567: remote_terminate

Notes on starting processes in Python

Using os.fork()

Here is an example using os.fork() to spawn 5 processes each running the python function, myfunc. Don't forget the os._exit() at the end. Per the docs, normally, sys.exit() is used, but os._exit() can be used in child processes after a fork. It does not call cleanup handlers, flush stdio buffers, etc.

import os
import time

def myfunc(i):
    print "sleeping 5 seconds from process %s" % i
    time.sleep(5)
    print "finished sleeping from process %s" % i

for i in range(5):
    pid = os.fork()
    if pid == 0:
        myfunc(i)
        os._exit(0)

Results:

sleeping 5 seconds from process 0
sleeping 5 seconds from process 1
sleeping 5 seconds from process 2
sleeping 5 seconds from process 3
sleeping 5 seconds from process 4

And 5 seconds later...

finished sleeping from process 0
finished sleeping from process 1
finished sleeping from process 2
finished sleeping from process 3
finished sleeping from process 4
Running an external script in subprocesses

Alternatively, if you want to run an external script in multiple processes, you can use the Popen class in the subprocess module. For example, to run the following script, called "myscript.py":

"myscript.py"
import sys
import time

def myfunc(i):
    print "sleeping 5 seconds from process %s" % i
    time.sleep(5)
    print "finished sleeping from process %s" % i

if __name__ == '__main__':
    myfunc(sys.argv[1])

use the following Python code stored in the same directory:

"popen_ex.py"
from subprocess import Popen

for i in range(5):
    Popen(['python', './myscript.py', str(i)])

The screen output is the same as the previous example. What's the differnce? fork() copies the process memory space including open file descriptors to the child process. In the second example, since I am executing a new Python interpreter from scratch, I get a "cleaner" start but probably more overhead as well.

Django Blog Project #15: New site logo

I now have a new site logo design drawn by my wife, Angela! Doesn't it look great? My previous logo was a crane picture I had just pulled from the web somewhere. So it is nice to have a custom logo done for me. Luckily my wife is artistic and didn't mind drawing it for me. I also made some minor changes to the title block to make things look a little better up there. Now to figure out how to style the rest of the page.

I also got a "memory over limit" warning from Webfaction this week. Over the weekend, I had redirected all my old Blogger posts to this blog, so apparently the small increase in traffic brought to light some of my inefficient code. To help solve the problem, I switched over to django-tagging. This eliminated a bunch of my inefficient code and I appear to be within the memory limits now. There is still another section of code I need to rework, but this solves the problem for now. Django-tagging is pretty cool-- I haven't quite got everything working correctly, but I will be sure to write some notes on it when I get the time.

Simplistic Python Thread example

Here is a simple Python example using the Thread object in the threading module.

import time
from threading import Thread

def myfunc(i):
    print "sleeping 5 sec from thread %d" % i
    time.sleep(5)
    print "finished sleeping from thread %d" % i

for i in range(10):
    t = Thread(target=myfunc, args=(i,))
    t.start()

Results:

sleeping 5 sec from thread 0
sleeping 5 sec from thread 1
sleeping 5 sec from thread 2
sleeping 5 sec from thread 3
sleeping 5 sec from thread 4
sleeping 5 sec from thread 5
sleeping 5 sec from thread 6
sleeping 5 sec from thread 7
sleeping 5 sec from thread 8
sleeping 5 sec from thread 9

...and 5 seconds later:

finished sleeping from thread 0
finished sleeping from thread 1
finished sleeping from thread 2
finished sleeping from thread 3
finished sleeping from thread 4
finished sleeping from thread 5
finished sleeping from thread 6
finished sleeping from thread 7
finished sleeping from thread 8
finished sleeping from thread 9

How to iterate over an instance object's data attributes in Python

To list the attributes of a Python instance object, I could use the built-in dir() function, however, this will return the instance object's methods as well data attributes. To get just the data attributes, I can use the instance object's __dict__ attribute:

class A(object):
    def __init__(self):
        self.myinstatt1 = 'one'
        self.myinstatt2 = 'two'
    def mymethod(self):
        pass

a = A()
for attr, value in a.__dict__.iteritems():
    print attr, value
Results:
myinstatt2 two
myinstatt1 one

Django Blog Project #14: Running Django 1.0

I'm now running Django 1.0 for this blog. For those who haven't heard, Django 1.0 Final was released yesterday sometime. My last Django update had been shortly after Beta 2 was released and updating to 1.0 Final didn't require any code changes for me. (I think everything since then was supposed to be bug fixes.) (I also updated my work project to 1.0 with no problem.) That's about all the content for this post. Here are some links:

Here are my notes on my path to 1.0:


And here are some Django 1.0 links:


Django Blog Project #13: Updating Django 1.0 Beta 2 New Comments and Adding Markdown support

I've updated to Django 1.0 Beta 2. One of the big items for this release was the new commenting framework. I had been waiting for this, so I was excited to see it was finally done.

I also added support for Markdown formatting of my comments. I actually could have added this earlier, but I only recently learned that Django has built-in support for Markdown.

Update URLConf

When I glanced over the changes for the new commenting framework, I missed this change and I actually had to Google on my error message. Luckily, someone (I don't remember where I found it now) had run into the same problem and saved me.

~/src/django/myblogsite/urls.py:
--- a/urls.py   Thu Aug 21 10:05:20 2008 -0500
+++ b/urls.py   Mon Sep 01 22:34:16 2008 -0700
@@ -1,6 +1,6 @@
 from django.conf.urls.defaults import *
 from django.contrib import admin
-from django.contrib.comments.models import FreeComment
+from django.contrib.comments.models import Comment
 from iwiwdsmi.myblogapp.views import *
 from iwiwdsmi.feeds import *
 from iwiwdsmi.views import *
@@ -19,7 +19,7 @@
     (r'^admin/(.*)', admin.site.root),
     (r'^feeds/(?P<url>.*)/$', 'django.contrib.syndication.views.feed', 
      {'feed_dict': feeds}),
-    (r'^comments/', include('django.contrib.comments.urls.comments')),
+    (r'^comments/', include('django.contrib.comments.urls')),
     
     (r'^$', rootview),</url>
Update the database

See the Django Upgrading from Django's previous comment system guide for more complete information.

  • I ran
    $ cd ~/src/django/myblogsite
    $ python manage.py syncdb
  • Then I entered my sqlite3 shell:
    $ sqlite3 mydatabase.sqlite3
    and pasted the following at the command prompt:
    BEGIN;
    
    INSERT INTO django_comments
        (content_type_id, object_pk, site_id, user_name, user_email, user_url,
        comment, submit_date, ip_address, is_public, is_removed)
    SELECT
        content_type_id, object_id, site_id, person_name, '', '', comment,
        submit_date, ip_address, is_public, approved
    FROM comments_freecomment;
    
    INSERT INTO django_comments
        (content_type_id, object_pk, site_id, user_id, user_name, user_email,
        user_url, comment, submit_date, ip_address, is_public, is_removed)
    SELECT
        content_type_id, object_id, site_id, user_id, '', '', '', comment,
        submit_date, ip_address, is_public, is_removed
    FROM comments_comment;
    
    UPDATE django_comments SET user_name = (
        SELECT username FROM auth_user
        WHERE django_comments.user_id = auth_user.id
    ) WHERE django_comments.user_id is not NULL;
    UPDATE django_comments SET user_email = (
        SELECT email FROM auth_user
        WHERE django_comments.user_id = auth_user.id
    ) WHERE django_comments.user_id is not NULL;
    
    COMMIT;
    then exited:
    .exit
Templates

The rest of the changes were with the templates.

  • I removed my old comments templates:
    rm -rf ~/src/django/myblogsite/templates/comments
  • I copied the new templates:
    cp -r ~/lib/django_trunk/django/contrib/comments ~/src/django/myblogsite/templates
  • I updated ~/src/django/myblogsite/templates/listpage.html:
    --- a/templates/listpage.html   Thu Aug 21 10:05:20 2008 -0500
    +++ b/templates/listpage.html   Mon Sep 01 22:46:34 2008 -0700
    @@ -47,7 +47,7 @@
           {% endfor %}
           &nbsp;&nbsp;|&nbsp;&nbsp;
     
    -      {% get_free_comment_count for myblogapp.post post.id as comment_count %}
    +      {% get_comment_count for myblogapp.post post.id as comment_count %}
           <a href="{{ post.get_absolute_url }}#comments">
             {{ comment_count|add:post.lc_count }} 
             Comment{{ comment_count|add:post.lc_count|pluralize}}</a>
    
  • I updated ~/src/django/myblogsite/templates/singlepost.html:
    --- a/templates/singlepost.html Thu Aug 21 10:05:20 2008 -0500
    +++ b/templates/singlepost.html Tue Sep 02 00:44:51 2008 -0700
    @@ -1,6 +1,7 @@
     {% extends "base.html" %}
     
     {% load comments %}
    +{% load markup %}
     
     {% block title %}
       {{ main_title }}: {{ post.title }}
    @@ -59,8 +60,8 @@
       {% endfor %}
       <br />
     
    -  {% get_free_comment_list for myblogapp.post post.id as comment_list %}
    -  {% get_free_comment_count for myblogapp.post post.id as comment_count %}
    +  {% get_comment_list for myblogapp.post post.id as comment_list %}
    +  {% get_comment_count for myblogapp.post post.id as comment_count %}
       {% if comment_list %}
         <h4>{{ comment_count }} 
         {% if lc_list %}New {% endif %}
    @@ -69,13 +70,19 @@
       {% for comment in comment_list %}
         <br />
         <a name="c{{ comment.id }}" href="#c{{ comment.id }}">#{{ forloop.counter }}</a>
    -    <b>{{ comment.person_name|escape }}</b> commented, 
    -      on {{ comment.submit_date|date:"F j, Y" }} at {{ comment.submit_date|date:"P" }}:
    -    {{ comment.comment|escape|urlizetrunc:40|linebreaks }}
    +    <b>
    +      {% if comment.url %}
    +        <a href="{{ comment.url }}">{{ comment.name|escape }}</a>
    +      {% else %}
    +        {{ comment.name|escape }}
    +      {% endif %}
    +    </b> commented, 
    +    on {{ comment.submit_date|date:"F j, Y" }} at {{ comment.submit_date|date:"P" }}:
    +    {{ comment.comment|markdown:"safe" }}
       {% endfor %}
       <br />
     
       </h4><h4>Post a comment</h4>
    -  {% free_comment_form for myblogapp.post post.id %}
    +  {% render_comment_form for post %}
     
     {% endblock %}
    
Add django.contrib.markup to INSTALLED_APPS

To use Markdown, I added django.contrib.markup to my INSTALLED_APPS in settings.py ~/src/django/myblogsite/settings.py:

 INSTALLED_APPS = (
     'django.contrib.auth',
     'django.contrib.contenttypes',
     'django.contrib.sessions',
     'django.contrib.sites',
     'django.contrib.admin',
     'django.contrib.comments',
+    'django.contrib.markup',
     'iwiwdsmi.myblogapp',
 )

That was about it. I messed with the templates a little to try to make things a little prettier. I'm not completely satisfied yet though. My next step is to add django-openid support. Later on, I'd also like to add email notification and spam filtering.

Error messages:
got an unexpected keyword argument 'core'
Validating models...
Unhandled exception in thread started by 
Traceback (most recent call last):
  File "/home/sofeng/lib/python-packages/django/core/management/commands/runserver.py", line 47, in inner_run
    self.validate(display_num_errors=True)
  File "/home/sofeng/lib/python-packages/django/core/management/base.py", line 122, in validate
    num_errors = get_validation_errors(s, app)
  File "/home/sofeng/lib/python-packages/django/core/management/validation.py", line 28, in get_validation_errors
    for (app_name, error) in get_app_errors().items():
  File "/home/sofeng/lib/python-packages/django/db/models/loading.py", line 128, in get_app_errors  
    self._populate()
  File "/home/sofeng/lib/python-packages/django/db/models/loading.py", line 57, in _populate
    self.load_app(app_name, True)
  File "/home/sofeng/lib/python-packages/django/db/models/loading.py", line 72, in load_app
    mod = __import__(app_name, {}, {}, ['models'])
  File "/home/sofeng/src/django/mozblog/myblogapp/models.py", line 30, in 
    class LegacyComment(models.Model):
  File "/home/sofeng/src/django/mozblog/myblogapp/models.py", line 32, in LegacyComment
    website = models.URLField(core=False)
  File "/home/sofeng/lib/python-packages/django/db/models/fields/__init__.py", line 828, in __init__
    CharField.__init__(self, verbose_name, name, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'core'

I removed the core argument from my models. This is an oldforms related thing that has been removed. See here

Error while importing URLconf myblogsite.urls': cannot import name FreeComment

Comments have been refactored. See the Upgrading Guide

Notes on Django and MySql on Amazon's EC2

Install Elasticfox

Install the Elasticfox Firefox Extension for Amazon EC2: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=609

Set up Amazon EC2 accounts and Elasticfox

Follow Arope's instructions for setting up Amazon EC2 accounts and Elasticfox. I used the alestic/ubuntu-8.04-hardy-base-20080628.manifest.xml machine image.

view standard apache page

In Elasticfox, right-click on your running instance and select "Copy Public DNS Name to clipboard". Then, paste that address in your browser. You should see Apache's "It works!" page.

ssh into instance

In Elasticfox, right-click on your running instance and select "SSH to Public Domain Name"

install stuff

Ubuntu Hardy has the following versions:

  • Apache 2.2.8
  • Mod_python 3.3.1
  • MySql 5.0.51
  • Django 0.96.1

On your remote instance, do the following.

# apt-get update
# apt-get install python-django
# apt-get install mysql-server
# apt-get install python-mysqldb
# apt-get install libapache2-mod-python

Update 2008-09-09: The Django mod_python documentation recommends using Apache's prefork MPM as opposed to the worker MPM. The worker MPM was installed by default on my Alestic Ubuntu image so I uninstalled it and replaced it with the prefork version.

# apt-get autoremove --purge apache2-mpm-worker
# apt-get install apache2-mpm-prefork

To see your current version of Apache, run the command: apache2 -V

create a django project
# cd /srv
# django-admin startproject mysite
configure django mod_python

See also Jeff Baier's article: Installing Django on an Ubuntu Linux Server for more information.

Edit /etc/apache2/httpd.conf and insert the following:

<location "/">
    SetHandler python-program
    PythonHandler django.core.handlers.modpython
    SetEnv DJANGO_SETTINGS_MODULE mysite.settings
    PythonPath "['/srv'] + sys.path"
    PythonDebug On
</location>
restart the apache server
# /etc/init.d/apache2 restart

You should see Django's "It Worked!" page.

Set up a MySql database and user

Note, use the password you entered when installing MySql

# mysql -u root -p
Enter password: 

mysql> CREATE DATABASE django_db;
Query OK, 1 row affected (0.01 sec)

mysql> GRANT ALL ON django_db.* TO 'djangouser'@'localhost' IDENTIFIED BY 'yourpassword';
Query OK, 0 rows affected (0.03 sec)

mysql> quit
Bye
Edit the Django database settings
Edit mysite/settings.py:
DATABASE_ENGINE = 'mysql'           # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'ado_mssql'.
DATABASE_NAME = 'django_db'             # Or path to database file if using sqlite3.
DATABASE_USER = 'djangouser'             # Not used with sqlite3.
DATABASE_PASSWORD = 'yourpassword'         # Not used with sqlite3.
DATABASE_HOST = ''             # Set to empty string for localhost. Not used with sqlite3.
DATABASE_PORT = ''             # Set to empty string for default. Not used with sqlite3.
Do a 'syncdb' to create the database tables
# cd mysite
# python manage.py syncdb
Creating table auth_message
Creating table auth_group
Creating table auth_user
Creating table auth_permission
Creating table django_content_type
Creating table django_session
Creating table django_site

You just installed Django's auth system, which means you don't have any superusers defined.
Would you like to create one now? (yes/no): yes
Username (Leave blank to use 'sofeng'):    
E-mail address: [email protected]
Password: 
Password (again): 
Superuser created successfully.
Installing index for auth.Message model
Installing index for auth.Permission model
Loading 'initial_data' fixtures...
No fixtures found.
upload a mercurial django project

on the remote instance, install mercurial:

# apt-get install mercurial

on your local machine with the mercurial repo, run:

$ hg clone -e 'ssh -i /home/sofeng/.ec2-elasticfox/id_django-keypair.pem' yourproj ssh://[email protected]//srv/yourproj
where /home/sofeng/.ec2-elasticfox/id_django-keypair.pem is the private key associated with your instance and yourdns.compute-1.amazonaws.com is the public domain name associated with your instance.

back on the remote instance:

# cd /srv/mozblog
# hg update
# python manage.py syncdb
set up apache to serve static files
  • Create a link to the media files:
    # cd /var/www
    # ln -s /srv/mozblog/media site_media
    # ln -s /usr/share/python-support/python-django/django/contrib/admin/media/ admin_media
  • Edit /etc/apache2/httpd.conf:
    <location "/">
        SetHandler python-program
        PythonHandler django.core.handlers.modpython
        SetEnv DJANGO_SETTINGS_MODULE mozblog.settings
        PythonPath "['/srv'] + sys.path"
        PythonDebug On
    </location>
    <location "/site_media">
    
        SetHandler None
    </location>
    <location "/admin_media">
        SetHandler None
    </location>
Restart the apache server
# /etc/init.d/apache2 restart

How to conditionally replace items in a list

I wanted to replace items in a list based on a specific condition. For example, given a list of numbers, I want to replace all items that are negative with zero.

Naive way

At first, I thought of something like this:

mylist = [111, -222, 333, -444]
newlist = []
for item in mylist:
    if item < 0:
        item = 0
    newlist.append(item)
mylist = newlist
print mylist

Which gave me the expected results:

[111, 0, 333, 0]
Better way?

Then I tried using Python's enumerate (see my previous example) built-in function to replace the item in-line. This seems to be a more elegant solution to me. Is there a better way? How would you do it?

mylist = [111, -222, 333, -444]
for (i, item) in enumerate(mylist):
    if item < 0:
        mylist[i] = 0
print mylist

Results:

[111, 0, 333, 0]

Django Blog Project #12: Adding Pygments syntax highlighting

I've finally added automatic code highlighting to my blog. It uses Pygments to do the syntax highlighting and Beautiful Soup to find all the <pre> blocks to highlight. I still write my blog posts in HTML, but now add a class attribute to my <pre> tags to specify the Pygments lexer to use. For example, for python code, I use:

<pre class="python">
import this
def demo():
    pass</pre>

Which turns into:

import this
def demo():
    pass

I bought James Bennett's book, Practical Django Projects about a month ago and it has good information about creating a blog with Django. It also documented techniques for syntax highlighting which I used here. To summarize, I added a new attribute, called body_highlighted to my Post model. Then, I added a custom save() method which parses my original HTML with Beautiful Soup and highlights it with Pygments.

Model changes

Here is the relevant code in ~/src/django/myblogsite/myblogapp/models.py:

class Post(models.Model):
    # ...
    body = models.TextField()
    body_highlighted = models.TextField(editable=False, blank=True)

    def save(self):
        self.body_highlighted = self.highlight_code(self.body)
        super(Post, self).save()

    def highlight_code(self, html):
        soup = BeautifulSoup(html)
        preblocks = soup.findAll('pre')
        for pre in preblocks:
            if pre.has_key('class'):
                try:
                    code = ''.join([unicode(item) for item in pre.contents])
                    code = self.unescape_html(code)
                    lexer = lexers.get_lexer_by_name(pre['class'])
                    formatter = formatters.HtmlFormatter()
                    code_hl = highlight(code, lexer, formatter)
                    pre.replaceWith(BeautifulSoup(code_hl))
                except:
                    pass
        return unicode(soup)

    def unescape_html(self, html):
        html = html.replace('&lt;', '<')
        html = html.replace('&gt;', '>')
        html = html.replace('&amp;', '&')
        return html

Update 2010-04-09: I added the unescape_html method so that I could highlight Python code with regular expression named groups. For example:

m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
With the new fix in place, I just need to escape the < and > characters with &lt; and &gt; and the syntax highlighting will display correctly. Before I made the fix, if I did not escape the characters, BeautifulSoup would add closing tags to what it thought was my malformed HTML. So instead of the above, it looked like this:
m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")</last_name></first_name>
If anyone knows of a better solution, please let me know.

Update the database
  • List the SQL commands Django would use:
    $ cd ~/src/django/myblogsite/
    $ python manage.py sqlall myblogapp
    
    BEGIN;
    CREATE TABLE "myblogapp_post" (
        "id" integer NOT NULL PRIMARY KEY,
        "author_id" integer NOT NULL REFERENCES "auth_user" ("id"),
        "title" varchar(200) NOT NULL,
        "slug" varchar(200) NOT NULL,
        "date_created" datetime NOT NULL,
        "date_modified" datetime NOT NULL,
        "tags" varchar(200) NOT NULL,
        "body" text NOT NULL,
        "body_highlighted" text NOT NULL,
    )
    ;
    CREATE INDEX "myblogapp_post_author_id" ON "myblogapp_post" ("author_id");
    CREATE INDEX "myblogapp_post_slug" ON "myblogapp_post" ("slug");
    COMMIT;
  • Enter the sqlite3 shell:
    $ sqlite3 mydatabase.sqlite3
    

    and enter the following statements:
    sqlite> ALTER TABLE myblogapp_post ADD COLUMN body_highlighted text;
    sqlite> .exit
Update the template

Here is the relevant code in ~/src/django/myblogsite/templates/singlepost.html:

    {% if post.body_highlighted %}
      {{ post.body_highlighted|safe }}
    {% else %}
      {{ post.body|safe }}
    {% endif %}
Add CSS for Pygments

One last step is to add the CSS for Pygments. Here is an excerpt from my ~/src/django/myblogsite/media/css/mystyle.css:

/* PYGMENTS STYLE */
/* customized */
.c  { color: #008040; font-style: italic } /* Comment */
.cm { color: #008040; font-style: italic } /* Comment.Multiline */
.cp { color: #BC7A00 } /* Comment.Preproc */
.c1 { color: #008040; font-style: italic } /* Comment.Single */
.cs { color: #008040; font-style: italic } /* Comment.Special */
.gd { color: grey; text-decoration: line-through } /* Generic.Deleted */
.gi { color: red; } /* Generic.Inserted */
.k  { color: #000080; font-weight: bold } /* Keyword */
.kc { color: #000000; font-weight: bold } /* Keyword.Constant */
.kd { color: #000000; font-weight: bold } /* Keyword.Declaration */
.kp { color: #000000 } /* Keyword.Pseudo */
.kr { color: #000000; font-weight: bold } /* Keyword.Reserved */
.kt { color: #000000; font-weight: bold } /* Keyword.Type */

/* original settings */
.err { border: 1px solid #FF0000 } /* Error */
.o { color: #666666 } /* Operator */
.ge { font-style: italic } /* Generic.Emph */
.gr { color: #FF0000 } /* Generic.Error */
.gh { color: #000080; font-weight: bold } /* Generic.Heading */
.go { color: #808080 } /* Generic.Output */
.gp { color: #000080; font-weight: bold } /* Generic.Prompt */
.gs { font-weight: bold } /* Generic.Strong */
.gu { color: #800080; font-weight: bold } /* Generic.Subheading */
.gt { color: #0040D0 } /* Generic.Traceback */
.m { color: #666666 } /* Literal.Number */
.s { color: #BA2121 } /* Literal.String */
.na { color: #7D9029 } /* Name.Attribute */
.nb { color: #008000 } /* Name.Builtin */
.nc { color: #0000FF; font-weight: bold } /* Name.Class */
.no { color: #880000 } /* Name.Constant */
.nd { color: #AA22FF } /* Name.Decorator */
.ni { color: #999999; font-weight: bold } /* Name.Entity */
.ne { color: #D2413A; font-weight: bold } /* Name.Exception */
.nf { color: #0000FF } /* Name.Function */
.nl { color: #A0A000 } /* Name.Label */
.nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
.nt { color: #008000; font-weight: bold } /* Name.Tag */
.nv { color: #19177C } /* Name.Variable */
.ow { color: #AA22FF; font-weight: bold } /* Operator.Word */
.w { color: #bbbbbb } /* Text.Whitespace */
.mf { color: #666666 } /* Literal.Number.Float */
.mh { color: #666666 } /* Literal.Number.Hex */
.mi { color: #666666 } /* Literal.Number.Integer */
.mo { color: #666666 } /* Literal.Number.Oct */
.sb { color: #BA2121 } /* Literal.String.Backtick */
.sc { color: #BA2121 } /* Literal.String.Char */
.sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */
.s2 { color: #BA2121 } /* Literal.String.Double */
.se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
.sh { color: #BA2121 } /* Literal.String.Heredoc */
.si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */
.sx { color: #008000 } /* Literal.String.Other */
.sr { color: #BB6688 } /* Literal.String.Regex */
.s1 { color: #BA2121 } /* Literal.String.Single */
.ss { color: #19177C } /* Literal.String.Symbol */
.bp { color: #008000 } /* Name.Builtin.Pseudo */
.vc { color: #19177C } /* Name.Variable.Class */
.vg { color: #19177C } /* Name.Variable.Global */
.vi { color: #19177C } /* Name.Variable.Instance */
.il { color: #666666 } /* Literal.Number.Integer.Long */

All pau. Now we should have pretty syntax highlighted code! (For those keeping track, this is now version 0.1.3 of my blog.)