SaltyCrane: python

Notes on Fabric 2 and Python 3

2021-02-07T19:49:30-08:00

Fabric 2 is a Python package used for running commands on remote machines via SSH. Fabric 2 supports Python 3 and is a rewrite of the Fabric I used years ago. Here are my notes on using Fabric 2 and Python 3.

Set up SSH config and SSH agent

Create or edit your ~/.ssh/config file to contain your remote host parameters

Host myhost
    User myusername
    HostName myhost.com
    IdentityFile ~/.ssh/id_rsa

Add your private key to your SSH agent
```
$ ssh-add ~/.ssh/id_rsa
```

Create a project, create a virtualenv, and install fabric2

$ mkdir -p /tmp/my-project
$ cd /tmp/my-project
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install fabric2

Create a fabfile.py script

Create a file /tmp/my-project/fabfile.py with the following contents. Note: "myhost" is the same name used in ~/.ssh/config described above.

from fabric2 import task

hosts = ["myhost"]

@task(hosts=hosts)
def mytask(c):
    print("Starting mytask...")
    with c.cd("/var"):
        c.run("ls -l")
    print("Done.")

Run the fabric script

In /tmp/my-project, with the virtualenv activated, run the fabric task to list the contents of /var on the remote host.

$ fab2 mytask

Output:

Starting mytask...

total 48
drwxr-xr-x  2 root root   4096 backups
drwxr-xr-x  9 root root   4096 cache
drwxrwxrwt  2 root root   4096 crash
drwxr-xr-x 38 root root   4096 lib
drwxrwsr-x  2 root root   4096 local
drwxrwxrwt  2 root root   4096 lock
drwxrwxr-x 14 root root   4096 log
drwxrwsr-x  2 root root   4096 mail
drwxr-xr-x  2 root root   4096 opt
drwxr-xr-x  5 root root   4096 spool
drwxrwxrwt  2 root root   4096 tmp
drwxr-xr-x  3 root root   4096 www

Done.

Creating a GraphQL API with Python, Graphene, and Postgres

2019-02-20T21:44:39-08:00

Here are my notes on creating a GraphQL API with Python, Django, Graphene, and Postgres. I learned almost everything from the excellent GraphQL Python tutorial at howtographql.com. Optimistically, I'll write a series of posts about how to make a React Native app that no one will use. Realistically, I won't.

Install Python
Create project directory and virtualenv
Install Django and create a Django project
Run Postgres in Docker
Create a database
Configure Django to use Postgres
Install and configure Graphene
Create a new Django app and add a model
GraphQL all the things
Try it using the GraphiQL explorer
References / See also

Install Python 3.7.2

$ brew install python

Create project directory and virtualenv

$ # make project directory
$ mkdir travelog-api
$ cd travelog-api
$ # make virtualenv
$ python3 -m venv venv
$ # activate virtualenv
$ source venv/bin/activate
$ # upgrade pip
$ pip install --upgrade pip

Install Django and create a Django project

(Still in the travelog-api directory with virtualenv activated)

Install Django 2.1.7:
```
$ pip install Django 
```

Create Django project:

$ django-admin startproject travelog_api ./

Run migrations and run the server:

$ ./manage.py migrate
$ ./manage.py runserver
$ # go to http://localhost:8000 in the browser

Run Postgres in Docker

(Still in the travelog-api directory with virtualenv activated)

Install Docker for Mac

Create a new file, travelog-api/docker-compose.yml:

version: "3.7"
services:
  db:
    image: "postgres:11.2"
    container_name: "travelog_postgres1"
    ports:
      - "54321:5432"
    volumes:
      - postgres_data1:/var/lib/postgresql/data
volumes:
  postgres_data1:
    name: travelog_postgres_data1

Start Postgres

$ docker-compose up -d
$ docker-compose logs

Create a database

Start psql:

$ docker exec -it travelog_postgres1 psql -U postgres

Create a database (be sure to include the semicolon):
```
postgres=# create database travelog;
```

Create user:

postgres=# create user traveloguser with password 'mypassword';
postgres=# grant all privileges on database travelog to traveloguser;

Exit psql:
```
postgres=# \q
```

Configure Django to use Postgres

(Still in the travelog-api directory with virtualenv activated)

Install psycopg2 2.7.7:
```
$ pip install psycopg2-binary
```

Edit travelog-api/travelog_api/settings.py:

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        "NAME": "travelog",
        "USER": "traveloguser",
        "PASSWORD": "mypassword,
        "HOST": "localhost",
        "PORT": "54321",
    }
}

Run database migrations and run the server:

$ ./manage.py migrate
$ ./manage.py runserver
$ # go to http://localhost:8000 in the browser

Install and configure Graphene

(Still in the travelog-api directory with virtualenv activated)

Install graphene-django 2.2.0
```
$ pip install graphene-django
```

Edit the INSTALLED_APPS setting in travelog-api/travelog_api/settings.py:

INSTALLED_APPS = (
    # After the default packages
    "graphene_django",
)

Create a new Django app and add a model

(Still in the travelog-api directory with virtualenv activated)

Create a new Django app:
```
$ ./manage.py startapp geo
```

Edit travelog_api/settings.py:

INSTALLED_APPS = (
    # After the default packages
    'graphene_django',
    'geo',
)

Edit travelog-api/geo/models.py:

from django.db.models import DateTimeField, FloatField, Model, TextField

class Location(Model):
    created_at = DateTimeField(auto_now_add=True)
    lat = FloatField()
    lon = FloatField()
    name = TextField(blank=True)
    updated_at = DateTimeField(auto_now=True)

Make and run migrations:

$ ./manage.py makemigrations
$ ./manage.py migrate

GraphQL all the things

(Still in the travelog-api directory with virtualenv activated)

Create a new file travelog-api/geo/schema.py:

import graphene
from graphene_django.types import DjangoObjectType
from .models import Location

class LocationType(DjangoObjectType):
    class Meta:
        model = Location

class Query(object):
    all_locations = graphene.List(LocationType)

    def resolve_all_locations(self, info, **kwargs):
        return Location.objects.all()

class CreateLocation(graphene.Mutation):
    location = graphene.Field(LocationType)

    class Arguments:
        lat = graphene.Float()
        lon = graphene.Float()
        name = graphene.String()

    def mutate(self, info, lat, lon, name):
        loc = Location(lat=lat, lon=lon, name=name)
        loc.save()
        return CreateLocation(location=loc)

class Mutation(graphene.ObjectType):
    create_location = CreateLocation.Field()

Create a new file travelog-api/travelog_api/schema.py:

import graphene
import geo.schema

class Query(geo.schema.Query, graphene.ObjectType):
    pass

class Mutation(geo.schema.Mutation, graphene.ObjectType):
    pass

schema = graphene.Schema(query=Query, mutation=Mutation)

Edit travelog-api/travelog_api/urls.py:

from django.contrib import admin
from django.urls import path
from graphene_django.views import GraphQLView
from .schema import schema

urlpatterns = [
    path("admin/", admin.site.urls),
    path("graphql/", GraphQLView.as_view(graphiql=True, schema=schema)),
]

Try it using the GraphiQL explorer

(Still in the travelog-api directory with virtualenv activated)

Run the server:
```
$ ./manage.py runserver
```
Go to http://localhost:8000/graphql/ and you should see the GraphiQL interactive GraphQL explorer.

Create a location. Enter this mutation in the left pane and hit CTRL+ENTER:

mutation {
  createLocation(name: "my first location", lat: 1, lon: 2) {
    location {
      id
    }
  }
}

See the response:

{
  "data": {
    "createLocation": {
      "location": {
        "id": "1"
      }
    }
  }
}

Query all locations:

query {
  allLocations {
    createdAt
    id
    lat
    lon
    name
  }
}

See the response:

{
  "data": {
    "allLocations": [
      {
        "createdAt": "2019-02-22T06:39:08.512197+00:00",
        "id": "1",
        "lat": 1,
        "lon": 2,
        "name": "my first location"
      }
    ]
  }
}

References / See also

Switching to OS X and front end development

2015-09-10T08:26:36-07:00

After 7 years, I've yielded to the Dark Side and switched from Ubuntu to OS X on my work laptop. I've also switched from Python and back end web development to JavaScript and front end development. The former is mostly to support the latter.

Linux is rare¹, especially among front end developers, and I want to make it easy to collaborate as I learn new things. I've had problems working with Photoshop files in GIMP and I couldn't run the iOS simulator. Issues with Linux device drivers don't help.

I'm choosing front end development because I want to code closer to the end user.² In small part like Ian Bicking wrote last year, I feel unexcited about back end development and really excited about JavaScript and front end development. I'm excited about ES 2015 and React and React Native and CSS transitions.³ I'm even coming around to Node.js. JavaScript is uglier than Python, but it's getting better and there are things Python can't do that JavaScript can.⁴ If only beauty mattered, maybe I'd use Scheme.⁵

I'm sure I will hate OS X at first, but hopefully it will be good in the long run. If anyone can recommend a tiling window manager like Qtile for OS X, please let me know.

(I will continue using Emacs because Emacs rocks! ⁶)

I think I was the last person at my company running Linux.
I've been trying to do front end work for years now, but I finally got a sensible chance to switch as my company is changing it's technology stack from Python to Ruby and Backbone/Angular to React.
Update 2016-01-04: Here are even more exciting web technologies: Electron, progressive web apps, and WebAssembly.
Update 2016-01-01: I found James Hague had similar thoughts on Python and JavaScript.
Speaking of functional languages and JavaScript, Elm sounds pretty cool.
Update 2016-01-26: Or will I?

Logging the actual host (instead of localhost) with the Python elasticsearch client

2015-08-18T23:12:45-07:00

The Elasticsearch Python library has a cool feature that allows you to log the equivalent curl command of the query you are making. To use it you just need to configure the elasticsearch.trace logger. There is one annoyance-- by design, "localhost:9200" is logged even if you have configured another host. It is more useful to log the actual host I am using so I can copy and paste the command to co-workers without changing the hostname. At first I thought I'd monkeypatch the library to do what I wanted. Then I remembered I could use a logging.Filter to rewrite the log message.

I am using the high level elasticsearch-dsl library in my example, but the logging is actually done by the low level official elasticsearch library which elasticsearch-dsl wraps.

$ pip install elasticsearch-dsl

import logging

import elasticsearch_dsl as esdsl  # high level elasticsearch library
from elasticsearch import Elasticsearch  # low level official elasticsearch library


ELASTICSEARCH_HOSTS = [
    'http://es1.myhost.com:9200',
    'http://es2.myhost.com:9200',
    'http://es3.myhost.com:9200',
]


class ReplaceHostFilter(logging.Filter):
    """
    Replace "localhost:9200" with real host
    """
    def __init__(self, real_host, *args, **kwargs):
        self.real_host = real_host
        super(ReplaceHostFilter, self).__init__(*args, **kwargs)

    def filter(self, record):
        record.msg = record.msg.replace('http://localhost:9200', self.real_host)
        record.msg = record.msg.replace('curl', 'curl -sS')
        record.msg += ' | python -mjson.tool'
        return True


# Set up logging
host_replacer = ReplaceHostFilter(ELASTICSEARCH_HOSTS[0])
formatter = logging.Formatter("%(levelname)s %(asctime)s %(name)s - %(message)s")
sh = logging.StreamHandler()
sh.setFormatter(formatter)
eslogger = logging.getLogger('elasticsearch.trace')
eslogger.setLevel('INFO')
eslogger.addFilter(host_replacer)
eslogger.addHandler(sh)


# Make a query
esclient = Elasticsearch(hosts=ELASTICSEARCH_HOSTS)
search = esdsl.Search(using=esclient, index='my_index')
filt = esdsl.F('term', myTerm=1234)
search = search.filter(filt)
response = search.execute()
print response.hits.total

Log output showing the actual host instead of localhost:

INFO 2015-07-28 13:48:54,464 elasticsearch.trace - curl -sS -XGET 'http://es1.myhost.com:9200/my_index/_search?pretty' -d '{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "myTerm": 1234
        }
      },
      "query": {
        "match_all": {}
      }
    }
  }
}' | python -mjson.tool

Calling JavaScript from Python to de-CloudFlare scraped content

2015-07-15T21:10:48-07:00

Yesterday I wrote a script to scrape my own web page because I screwed up the CSV export feature and Product needed the data. One problem was that the CloudFlare CDN obfuscated the email addresses on the page. My ~~solution~~crazy hack: running a Node.js script to de-obfuscate the email from my Python scraping script.

Example obfuscated email stuff from CloudFlare:

<a href="/cdn-cgi/l/email-protection#d4b7b5a6b194a4b1a0e7e2e4fab7bbb9">
    <span class="__cf_email__" data-cfemail="0162607364417164753237312f626e6c">[email protected]</span>
    <script cf-hash='f9e31' type="text/javascript">
     /* <![CDATA[ */!function(){try{var t="currentScript"in document?document.currentScript:function(){for(var t=document.getElementsByTagName("script"),e=t.length;e--;)if(t[e].getAttribute("cf-hash"))return t[e]}();if(t&&t.previousSibling){var e,r,n,i,c=t.previousSibling,a=c.getAttribute("data-cfemail");if(a){for(e="",r=parseInt(a.substr(0,2),16),n=2;a.length-n;n+=2)i=parseInt(a.substr(n,2),16)^r,e+=String.fromCharCode(i);e=document.createTextNode(e),c.parentNode.replaceChild(e,c)}}}catch(u){}}();/* ]]> */
    </script>
</a>

Using jsbeautifier.org, I adapted the JavaScript from above into this Node.js script, decloudflare.js.

var e, r, n, i, a = process.argv[2];
for (e = "", r = parseInt(a.substr(0, 2), 16), n = 2; a.length - n; n += 2) i = parseInt(a.substr(n, 2), 16) ^ r, e += String.fromCharCode(i);
console.log(e);

Example usage:

$ node decloudflare.js 0162607364417164753237312f626e6c
care@pet360.com

I used the Naked library (thanks to Sweetmeat) to call the Node.js script. (Though probably I could've just used the subprocess module.)

$ pip install Naked

from Naked.toolshed.shell import muterun_js

def decloudflare_email(cfemail):
    resp = muterun_js('decloudflare.js', cfemail)
    return resp.stdout.rstrip()

cfemail = '0162607364417164753237312f626e6c'
print 'cfemail from python: ' + cfemail
email = decloudflare_email(cfemail)
print 'email from python: ' + email

cfemail from python: 0162607364417164753237312f626e6c
email from python: care@pet360.com

An example using Python's groupby and defaultdict to do the same task

2014-10-09T22:45:10-07:00

Here is some data that I want to group by model:

SOME_DATA = [
    {'model': u'Yaris', 'some_value': 11202, 'trim_name': u'3-Door L Manual'},
    {'model': u'Yaris', 'some_value': 19269, 'trim_name': u'3-Door LE Automatic'},
    {'model': u'Corolla', 'some_value': 27119, 'trim_name': u'L Automatic'},
    {'model': u'Corolla', 'some_value': 32262, 'trim_name': u'LE'},
    {'model': u'Corolla', 'some_value': 37976, 'trim_name': u'S Premium'},
    {'model': u'Camry', 'some_value': 39730, 'trim_name': u'LE 4-Cyl'},
    {'model': u'Camry', 'some_value': 45761, 'trim_name': u'XSE 4-Cyl'},
    {'model': u'Yaris', 'some_value': 48412, 'trim_name': u'3-Door L Automatic'},
    {'model': u'Camry', 'some_value': 55423, 'trim_name': u'XLE 4-Cyl'},
    {'model': u'Corolla', 'some_value': 57055, 'trim_name': u'ECO Premium'},
    {'model': u'Corolla', 'some_value': 61296, 'trim_name': u'ECO Plus'},
    {'model': u'Camry', 'some_value': 63660, 'trim_name': u'XSE V6'},
    {'model': u'Yaris', 'some_value': 65570, 'trim_name': u'5-Door LE Automatic'},
    {'model': u'Camry', 'some_value': 67461, 'trim_name': u'XLE V6'},
    {'model': u'Corolla', 'some_value': 73602, 'trim_name': u'S'},
    {'model': u'Yaris', 'some_value': 74158, 'trim_name': u'5-Door SE Manual'},
    {'model': u'Corolla', 'some_value': 74249, 'trim_name': u'LE Plus'},
    {'model': u'Corolla', 'some_value': 78386, 'trim_name': u'ECO'},
    {'model': u'Camry', 'some_value': 82747, 'trim_name': u'SE 4-Cyl'},
    {'model': u'Corolla', 'some_value': 83162, 'trim_name': u'LE Premium'},
    {'model': u'Corolla', 'some_value': 84863, 'trim_name': u'S Plus Manual'},
    {'model': u'Yaris', 'some_value': 90313, 'trim_name': u'5-Door L Automatic'},
    {'model': u'Corolla', 'some_value': 90452, 'trim_name': u'L Manual'},
    {'model': u'Yaris', 'some_value': 93152, 'trim_name': u'5-Door SE Automatic'},
    {'model': u'Corolla', 'some_value': 94973, 'trim_name': u'S Plus CVT'},
]

This can be done using defaultdict from the collections module.

import collections

grouped = collections.defaultdict(list)
for item in SOME_DATA:
    grouped[item['model']].append(item)

for model, group in grouped.items():
    print
    print model
    pprint(group, width=150)

Here are the results:

Yaris
[{'model': u'Yaris', 'some_value': 27065, 'trim_name': u'5-Door L Automatic'},
 {'model': u'Yaris', 'some_value': 32757, 'trim_name': u'5-Door SE Automatic'},
 {'model': u'Yaris', 'some_value': 57344, 'trim_name': u'3-Door L Manual'},
 {'model': u'Yaris', 'some_value': 64002, 'trim_name': u'5-Door SE Manual'},
 {'model': u'Yaris', 'some_value': 77974, 'trim_name': u'3-Door L Automatic'},
 {'model': u'Yaris', 'some_value': 92658, 'trim_name': u'3-Door LE Automatic'},
 {'model': u'Yaris', 'some_value': 98769, 'trim_name': u'5-Door LE Automatic'}]

Camry
[{'model': u'Camry', 'some_value': 30247, 'trim_name': u'XSE 4-Cyl'},
 {'model': u'Camry', 'some_value': 33809, 'trim_name': u'XSE V6'},
 {'model': u'Camry', 'some_value': 65637, 'trim_name': u'LE 4-Cyl'},
 {'model': u'Camry', 'some_value': 67329, 'trim_name': u'SE 4-Cyl'},
 {'model': u'Camry', 'some_value': 76269, 'trim_name': u'XLE 4-Cyl'},
 {'model': u'Camry', 'some_value': 87438, 'trim_name': u'XLE V6'}]

Corolla
[{'model': u'Corolla', 'some_value': 11239, 'trim_name': u'S'},
 {'model': u'Corolla', 'some_value': 27356, 'trim_name': u'S Plus Manual'},
 {'model': u'Corolla', 'some_value': 44792, 'trim_name': u'L Manual'},
 {'model': u'Corolla', 'some_value': 56252, 'trim_name': u'ECO Premium'},
 {'model': u'Corolla', 'some_value': 78570, 'trim_name': u'S Plus CVT'},
 {'model': u'Corolla', 'some_value': 78964, 'trim_name': u'LE Premium'},
 {'model': u'Corolla', 'some_value': 82116, 'trim_name': u'ECO'},
 {'model': u'Corolla', 'some_value': 85467, 'trim_name': u'S Premium'},
 {'model': u'Corolla', 'some_value': 87099, 'trim_name': u'L Automatic'},
 {'model': u'Corolla', 'some_value': 91974, 'trim_name': u'LE Plus'},
 {'model': u'Corolla', 'some_value': 94862, 'trim_name': u'LE'},
 {'model': u'Corolla', 'some_value': 97625, 'trim_name': u'ECO Plus'}]

This can also be done using itertools.groupby. This method is probably better when working with large datasets because groupby returns the group as an iterator. (This is the reason I convert it to a list before printing.)

import itertools

def keyfunc(x):
    return x['model']

SOME_DATA = sorted(SOME_DATA, key=keyfunc)
for model, group in itertools.groupby(SOME_DATA, keyfunc):
    print
    print model
    pprint(list(group), width=150)

Here are the results:

Camry
[{'model': u'Camry', 'some_value': 36776, 'trim_name': u'SE 4-Cyl'},
 {'model': u'Camry', 'some_value': 56569, 'trim_name': u'LE 4-Cyl'},
 {'model': u'Camry', 'some_value': 57052, 'trim_name': u'XSE 4-Cyl'},
 {'model': u'Camry', 'some_value': 92360, 'trim_name': u'XLE V6'},
 {'model': u'Camry', 'some_value': 92756, 'trim_name': u'XSE V6'},
 {'model': u'Camry', 'some_value': 94413, 'trim_name': u'XLE 4-Cyl'}]

Corolla
[{'model': u'Corolla', 'some_value': 13307, 'trim_name': u'L Automatic'},
 {'model': u'Corolla', 'some_value': 15726, 'trim_name': u'ECO Plus'},
 {'model': u'Corolla', 'some_value': 25579, 'trim_name': u'S'},
 {'model': u'Corolla', 'some_value': 31920, 'trim_name': u'ECO Premium'},
 {'model': u'Corolla', 'some_value': 34480, 'trim_name': u'LE'},
 {'model': u'Corolla', 'some_value': 44958, 'trim_name': u'S Plus Manual'},
 {'model': u'Corolla', 'some_value': 49606, 'trim_name': u'LE Premium'},
 {'model': u'Corolla', 'some_value': 59629, 'trim_name': u'LE Plus'},
 {'model': u'Corolla', 'some_value': 74226, 'trim_name': u'S Plus CVT'},
 {'model': u'Corolla', 'some_value': 75725, 'trim_name': u'L Manual'},
 {'model': u'Corolla', 'some_value': 82382, 'trim_name': u'ECO'},
 {'model': u'Corolla', 'some_value': 95633, 'trim_name': u'S Premium'}]

Yaris
[{'model': u'Yaris', 'some_value': 16789, 'trim_name': u'3-Door L Manual'},
 {'model': u'Yaris', 'some_value': 20349, 'trim_name': u'5-Door LE Automatic'},
 {'model': u'Yaris', 'some_value': 42897, 'trim_name': u'5-Door L Automatic'},
 {'model': u'Yaris', 'some_value': 62045, 'trim_name': u'5-Door SE Automatic'},
 {'model': u'Yaris', 'some_value': 91913, 'trim_name': u'3-Door L Automatic'},
 {'model': u'Yaris', 'some_value': 94218, 'trim_name': u'5-Door SE Manual'},
 {'model': u'Yaris', 'some_value': 97979, 'trim_name': u'3-Door LE Automatic'}]

A bank style session timeout example using jQuery, Bootstrap, and Flask

2014-09-17T18:38:10-07:00

This is an example that uses JavaScript to display a session timeout warning modal 10 minutes before session expiration. It also resets the session expiration whenever the user clicks the mouse. It uses JavaScript, jQuery, and Bootstrap on the frontend and Python, Flask, Flask-Login, and WTForms on the backend.

Mouse clicks anywhere on the page ping the server at a maximum frequecy of once per minute and reset the session expiration.
10 minutes before the session expiration, a warning modal is displayed with two buttons: "Log out" and "Stay Logged In".
If the user clicks "Stay Logged In" the session expiration is reset.
If the user clicks "Log out", the user is logged out.
If the user does nothing for 10 minutes, the user is logged out and displayed a message that the session timed out.

Here is the JavaScript (session-monitor.js):

sessionMonitor = function(options) {
    "use strict";

    var defaults = {
            // Session lifetime (milliseconds)
            sessionLifetime: 60 * 60 * 1000,
            // Amount of time before session expiration when the warning is shown (milliseconds)
            timeBeforeWarning: 10 * 60 * 1000,
            // Minimum time between pings to the server (milliseconds)
            minPingInterval: 1 * 60 * 1000,
            // Space-separated list of events passed to $(document).on() that indicate a user is active
            activityEvents: 'mouseup',
            // URL to ping the server using HTTP POST to extend the session
            pingUrl: '/ping',
            // URL used to log out when the user clicks a "Log out" button
            logoutUrl: '/logout',
            // URL used to log out when the session times out
            timeoutUrl: '/logout?timeout=1',
            ping: function() {
                // Ping the server to extend the session expiration using a POST request.
                $.ajax({
                    type: 'POST',
                    url: self.pingUrl
                });
            },
            logout: function() {
                // Go to the logout page.
                window.location.href = self.logoutUrl;
            },
            onwarning: function() {
                // Below is example code to demonstrate basic functionality. Use this to warn
                // the user that the session will expire and allow the user to take action.
                // Override this method to customize the warning.
                var warningMinutes = Math.round(self.timeBeforeWarning / 60 / 1000),
                    $alert = $('Your session will expire in ' + warningMinutes + ' minutes. ' +
  '' +
  '' +
  '');

                if (!$('body').children('div#jqsm-warning').length) {
                    $('body').prepend($alert);
                }
                $('div#jqsm-warning').show();
                $('button#jqsm-stay-logged-in').on('click', self.extendsess)
                    .on('click', function() { $alert.hide(); });
                $('button#jqsm-log-out').on('click', self.logout);
            },
            onbeforetimeout: function() {
                // By default this does nothing. Override this method to perform actions
                // (such as saving draft data) before the user is automatically logged out.
                // This may optionally return a jQuery Deferred object, in which case
                // ontimeout will be executed when the deferred is resolved or rejected.
            },
            ontimeout: function() {
                // Go to the timeout page.
                window.location.href = self.timeoutUrl;
            }
        },
        self = {},
        _warningTimeoutID,
        _expirationTimeoutID,
        // The time of the last ping to the server.
        _lastPingTime = 0;

    function extendsess() {
        // Extend the session expiration. Ping the server and reset the timers if
        // the minimum interval has passed since the last ping.
        var now = $.now(),
            timeSinceLastPing = now - _lastPingTime;

        if (timeSinceLastPing > self.minPingInterval) {
            _lastPingTime = now;
            _resetTimers();
            self.ping();
        }
    }

    function _resetTimers() {
        // Reset the session warning and session expiration timers.
        var warningTimeout = self.sessionLifetime - self.timeBeforeWarning;

        window.clearTimeout(_warningTimeoutID);
        window.clearTimeout(_expirationTimeoutID);
        _warningTimeoutID = window.setTimeout(self.onwarning, warningTimeout);
        _expirationTimeoutID = window.setTimeout(_onTimeout, self.sessionLifetime);
    }

    function _onTimeout() {
        // A wrapper that calls onbeforetimeout and ontimeout and supports asynchronous code.
        $.when(self.onbeforetimeout()).always(self.ontimeout);
    }

    // Add default variables and methods, user specified options, and non-overridable
    // public methods to the session monitor instance.
    $.extend(self, defaults, options, {
        extendsess: extendsess
    });
    // Set an event handler to extend the session upon user activity (e.g. mouseup).
    $(document).on(self.activityEvents, extendsess);
    // Start the timers and ping the server to ensure they are in sync with the backend session expiration.
    extendsess();

    return self;
};

Here is the important HTML:

  
  
    
      
      Login
      
    
    
      
        
          
            
              Your session is about to expire
            
            
              Your session will expire in  minutes due to inactivity.

Here is the Python Flask app (myapp.py):

import collections
import datetime

from flask import Flask, request, render_template, redirect, url_for, session
from flask.ext.login import (
    LoginManager, login_user, logout_user,  UserMixin, login_required)
from wtforms.fields import PasswordField, StringField
from wtforms.form import Form


UserRow = collections.namedtuple('UserRow', ['id', 'password'])
TOY_USER_DATABASE = {
    'george': UserRow(id=1, password='george'),
}


# settings ###############################################################
# Set a secret key to sign the session (Flask config value)
SECRET_KEY = 'insert secret key here'

# The amount of time after which the user's session expires
# (this is a Flask setting and is also used by the JavaScript)
PERMANENT_SESSION_LIFETIME = datetime.timedelta(minutes=60)


# init ###############################################################
app = Flask(__name__)
app.config.from_object(__name__)
login_manager = LoginManager()
login_manager.init_app(app)
login_manager.login_view = '.login'


@login_manager.user_loader
def load_user(userid):
    return User(userid)


@app.context_processor
def add_session_config():
    """Add current_app.permanent_session_lifetime converted to milliseconds
    to context. The config variable PERMANENT_SESSION_LIFETIME is not
    used because it could be either a timedelta object or an integer
    representing seconds.
    """
    return {
        'PERMANENT_SESSION_LIFETIME_MS': (
            app.permanent_session_lifetime.seconds * 1000),
    }


# models ###############################################################
class User(UserMixin):
    def __init__(self, id):
        self.id = id


# forms ###############################################################
class LoginForm(Form):
    username = StringField()
    password = PasswordField()


# views ###############################################################
@app.route("/login", methods=['GET', 'POST'])
def login():
    form = LoginForm(request.form)
    message = ''

    if request.method == 'POST' and form.validate():
        db_user = TOY_USER_DATABASE.get(form.username.data)
        if form.password.data == db_user.password:
            user = User(db_user.id)
            login_user(user)
            return redirect(url_for('.home'))
        else:
            message = 'Login failed.'

    context = {
        'form': form,
        'message': message,
    }
    return render_template('login.html', **context)


@app.route("/")
@login_required
def home():
    return render_template('home.html')


@app.route("/another-page")
@login_required
def another_page():
    return render_template('another_page.html')


@app.route("/logout")
@login_required
def logout():
    logout_user()
    return redirect(url_for('.logged_out') + '?' + request.query_string)


@app.route("/logged-out")
def logged_out():
    timed_out = request.args.get('timeout')
    return render_template('logged_out.html', timed_out=timed_out)


@app.route("/ping", methods=['POST'])
def ping():
    session.modified = True
    return 'OK'


if __name__ == "__main__":
    app.run(debug=True)

The full example is also on github at: https://github.com/saltycrane/session-timeout-example.

Initial ideas were taken from http://www.itworld.com/development/335546/how-create-session-timeout-warning-your-web-application-using-jquery but, from what I could tell, it pinged the server whether the user was active or not.

The old "%" string formatting and the new string .format() method handle unicode differently

2014-07-31T18:40:31-07:00

Today I learned that the old style "%" string formatting and the new string .format() method behave differently when interpolating unicode strings. I was suprised to find out that one of these lines raised an error while one did not:

'%s' % u'O\u2019Connor'

'{}'.format(u'O\u2019Connor')

The old style "%" formatting operation returns a unicode string if one of the values is a unicode string even when the format string is a non-unicode string:

Python 2.7.3 (default, Feb 27 2014, 19:58:35) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> '%s' % u'O\u2019Connor'
u'O\u2019Connor'

The new string .format() method called on a non-unicode string with a unicode string argument tries to encode the unicode string to a non-unicode string (bytestring) possibly raising a UnicodeEncodeError:

Python 2.7.3 (default, Feb 27 2014, 19:58:35) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> '{}'.format(u'O\u2019Connor')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 1: ordinal not in range(128)

I guess the correct thing to do is to start with a unicode format string:

Python 2.7.3 (default, Feb 27 2014, 19:58:35) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u'{}'.format(u'O\u2019Connor')
u'O\u2019Connor'

Python logging filters do not propagate like handlers and levels do

2014-02-12T11:15:19-08:00

Loggers are organized in a hierarchical fashion. A logger named 'foo.bar' is a child of a logger named 'foo'.

getLogger() returns a reference to a logger instance with the specified name if it is provided, or root if not. The names are period-separated hierarchical structures. Multiple calls to getLogger() with the same name will return a reference to the same logger object. Loggers that are further down in the hierarchical list are children of loggers higher up in the list. For example, given a logger with a name of foo, loggers with names of foo.bar, foo.bar.baz, and foo.bam are all descendants of foo. - Loggers documentation

If the level is not set on a logger, the level of the parent is used.

Loggers have a concept of effective level. If a level is not explicitly set on a logger, the level of its parent is used instead as its effective level. If the parent has no explicit level set, its parent is examined, and so on - all ancestors are searched until an explicitly set level is found. The root logger always has an explicit level set (WARNING by default). When deciding whether to process an event, the effective level of the logger is used to determine whether the event is passed to the logger’s handlers. - Loggers documentation

import logging

foo_logger = logging.getLogger('foo')
foo_logger.setLevel(20)

foo_bar_logger = logging.getLogger('foo.bar')

print foo_logger.getEffectiveLevel()
print foo_bar_logger.getEffectiveLevel()

20
20

Similarly, if a handler is not defined for a logger, the handler of the parent is used.

Child loggers propagate messages up to the handlers associated with their ancestor loggers. Because of this, it is unnecessary to define and configure handlers for all the loggers an application uses. It is sufficient to configure handlers for a top-level logger and create child loggers as needed. (You can, however, turn off propagation by setting the propagate attribute of a logger to False.) - Loggers documentation

import logging

myformatter = logging.Formatter("MY HANDLER: %(name)s - %(message)s")

myhandler = logging.StreamHandler()
myhandler.setFormatter(myformatter)

foo_logger = logging.getLogger('foo')
foo_logger.addHandler(myhandler)

foo_bar_logger = logging.getLogger('foo.bar')

foo_logger.error('asdfasdf')
foo_bar_logger.error('zxcvzxcv')

MY HANDLER: 40 foo - asdfasdf
MY HANDLER: 40 foo.bar - zxcvzxcv

However, filters, unlike levels and handlers, do not propagate. If a filter is not defined for a logger, the filter of the parent is NOT used.

Note that filters attached to handlers are consulted before an event is emitted by the handler, whereas filters attached to loggers are consulted whenever an event is logged (using debug(), info(), etc.), before sending an event to handlers. This means that events which have been generated by descendant loggers will not be filtered by a logger’s filter setting, unless the filter has also been applied to those descendant loggers. - Filter Objects documentation

Subdomain-based configuration for a Flask local development server

2014-02-07T15:09:16-08:00

This example shows how to set up a Flask local development server to use a different configuration based on the subdomain of the request. The project I work on has several environments (dev, qa, staging, etc). Each environment has different database and API hostnames. I use this to switch between database and API environments quickly while using my local development server.

This assumes a create_app function is used to create the Flask application instance as described in the Application Factories Flask documentation.

create_app

Modify the create_app function to take a configobj argument and use it to override the default configuration.

def create_app(configobj=None):
    app = Flask(__name__)

    # Default configuration
    app.config.from_object(__name__)

    # Override configuration using config passed into create_app
    if configobj:
        app.config.from_object(configobj)

    return app

SubdomainDispatcher

The SubdomainDispatcher is taken from the Application Dispatching Flask documentation. It is WSGI middleware that looks at the subdomain of the request and returns a different application instance for each subdomain. It calls the create_app function above and passes it the appropriate configuration object for the subdomain.

class SubdomainDispatcher(object):

    def __init__(self, create_app, domain=''):
        """
        :param create_app: a function that returns a `flask.Flask` instance
        :param domain: str - used to determine the subdomain
        """
        self.create_app = create_app
        self.domain = domain
        self.lock = Lock()
        self.instances = {}

    def __call__(self, environ, start_response):
        app = self._get_application(environ['HTTP_HOST'])
        return app(environ, start_response)

    def _get_application(self, host):
        host = host.split(':')[0]
        assert host.endswith(self.domain), 'Configuration error'
        subdomain = host[:-len(self.domain)].rstrip('.')
        with self.lock:
            app = self.instances.get(subdomain)
            if app is None:
                configobj = self._get_subdomain_based_config(subdomain)
                app = self.create_app(configobj=configobj)
                self.instances[subdomain] = app
            return app

    @staticmethod
    def _get_subdomain_based_config(subdomain):

        class Config(object):
            pass
        config = Config()

        if subdomain == 'dev':
            config.API_HOST = 'dev-host'
            config.DB_SERVER = 'dev-db-server'
        elif subdomain == 'qa':
            config.API_HOST = 'qa-host'
            config.DB_SERVER = 'qa-db-server'

        return config

rundevserver

rundevserver is similar to flask.Flask.run but uses the SubdomainDispatcher middleware before calling werkzeug.serving.run_simple.

def rundevserver(host=None, port=None, domain='', debug=True, **options):
    """
    Modified from `flask.Flask.run`

    Runs the application on a local development server.

    :param host: the hostname to listen on. Set this to ``'0.0.0.0'`` to
                 have the server available externally as well. Defaults to
                 ``'127.0.0.1'``.
    :param port: the port of the webserver. Defaults to ``5000``
    :param domain: used to determine the subdomain
    :param debug: if given, enable or disable debug mode.
                  See :attr:`debug`.
    :param options: the options to be forwarded to the underlying
                    Werkzeug server. See
                    :func:`werkzeug.serving.run_simple` for more
                    information.
    """
    from werkzeug.serving import run_simple

    if host is None:
        host = '127.0.0.1'
    if port is None:
        port = 5000
    options.setdefault('use_reloader', debug)
    options.setdefault('use_debugger', debug)

    app = SubdomainDispatcher(create_app, domain, debug=debug)

    run_simple(host, port, app, **options)

Usage

Add the following to your hosts file (/etc/hosts on Ubuntu):
```
0.0.0.0 dev.localhost
0.0.0.0 qa.localhost
```

Run the local dev server:

if __name__ == '__main__':
    rundevserver(host='0.0.0.0', port=5000, domain='localhost')

Use the following URLs to get different app configurations:
- http://localhost:5000 (default configuration)
- http://dev.localhost:5000
- http://qa.localhost:5000

Github code

A full working example is located on github: ~~flask-subdomain-dispatcher-example~~ An updated version of the code is here: flask-subdomaindevserver.

How to add a margin around markers in the Google Static Maps API using Python

2013-11-26T07:31:21-08:00

This example shows how to use Python to generate a Google Static Map URL for a map that contains markers within some dimensions which are smaller than the map image dimensions. This effectively allows for setting minimum X and Y margins around the markers in a map. This is useful for a "fluid" web design where a maximum map size is requested from Google and is then cut off at the edges for small browser windows.

The bulk of this solution is based on the Javascript code here: http://stackoverflow.com/questions/6048975/google-maps-v3-how-to-calculate-the-zoom-level-for-a-given-bounds

import math


def generate_map_url(
        min_map_width_px,
        max_map_width_px,
        min_map_height_px,
        max_map_height_px,
        marker_groups):
    """
    Return a Google Static Map URL for a map that contains markers within
    some dimensions which are smaller than the map image dimensions. This
    effectively allows for setting minimum X and Y margins around the markers
    in a map. This is useful for a "fluid" web design where a maximum map
    size is requested from Google and is then cut off at the edges for
    small browser windows.
    """
    # Determine the maximum zoom to contain markers at the minimum map size
    lat_list = [
        lat for markers in marker_groups for lat, lng in markers['lat_lng']]
    lng_list = [
        lng for markers in marker_groups for lat, lng in markers['lat_lng']]
    max_zoom = get_zoom_to_fit(
        min(lat_list), max(lat_list), min(lng_list), max(lng_list),
        min_map_width_px, min_map_height_px,
    )

    # Build the markers query string arguments
    markers_args = ''
    for markers in marker_groups:
        lat_lng = '|'.join([
            '{},{}'.format(lat, lng) for lat, lng in markers['lat_lng']])
        markers_args += '&markers=color:{}|{}'.format(markers['color'], lat_lng)

    # Build and return the map URL
    return ''.join([
        'http://maps.googleapis.com/maps/api/staticmap',
        '?sensor=false&v=3&visual_refresh=true',
        '&size={}x{}&zoom={}'.format(
            max_map_width_px, max_map_height_px, max_zoom),
        markers_args,
    ])


def get_zoom_to_fit(min_lat, max_lat, min_lng, max_lng, width_px, height_px):
    """
    Return the maximum zoom that will fit the given min/max lat/lng
    coordinates in a map of the given dimensions. This is used to
    override the zoom set by Google's implicit positioning.

    Calculation translated from Javascript to Python from:
    http://stackoverflow.com/questions/6048975/google-maps-v3-how-to-calculate-the-zoom-level-for-a-given-bounds
    """
    GOOGLE_WORLD_WIDTH = 256
    GOOGLE_WORLD_HEIGHT = 256
    MAX_ZOOM = 17

    def lat2rad(lat):
        sinlat = math.sin(math.radians(lat))
        radx2 = math.log((1 + sinlat) / (1 - sinlat)) / 2.0
        return max(min(radx2, math.pi), -math.pi) / 2.0

    def zoom(map_px, world_px, fraction):
        # Use int() to round down to the nearest integer
        return int(
            math.log(float(map_px) / float(world_px) / fraction)
            / math.log(2.0)
        )

    # Determine the maximum zoom based on height and latitude
    if min_lat == max_lat:
        lat_zoom = MAX_ZOOM
    else:
        lat_fraction = (lat2rad(max_lat) - lat2rad(min_lat)) / math.pi
        lat_zoom = zoom(height_px, GOOGLE_WORLD_HEIGHT, lat_fraction)

    # Determine the maximum zoom based on width and longitude
    if min_lng == max_lng:
        lng_zoom = MAX_ZOOM
    else:
        lng_range = max_lng - min_lng
        if lng_range < 0:
            lng_range += 360.0
        lng_fraction = lng_range / 360.0
        lng_zoom = zoom(width_px, GOOGLE_WORLD_WIDTH, lng_fraction)

    return min(lat_zoom, lng_zoom, MAX_ZOOM)

Here is an example:

map_url = generate_map_url(
    min_map_width_px=240, max_map_width_px=380,
    min_map_height_px=285, max_map_height_px=325,
    marker_groups=[
        {'color': 'blue',
         'lat_lng': [(34.0993, -118.8394)]},
        {'color': 'orange',
         'lat_lng': [
             (34.3997, -119.2002),
             (34.5389, -118.4499),
             (34.0983, -118.1285),
             (33.5932, -117.9455),
             (33.8322, -117.3958),
         ]}
    ]
)
print map_url

Here is a map without the margin: http://maps.googleapis.com/maps/api/staticmap?sensor=false&v=3&visual_refresh=true&size=380x325&markers=color:blue|34.0993,-118.8394&markers=color:orange|34.3997,-119.2002|34.5389,-118.4499|34.0983,-118.1285|33.5932,-117.9455|33.8322,-117.3958

Here is the result with the margin: http://maps.googleapis.com/maps/api/staticmap?sensor=false&v=3&visual_refresh=true&size=380x325&zoom=7&markers=color:blue|34.0993,-118.8394&markers=color:orange|34.3997,-119.2002|34.5389,-118.4499|34.0983,-118.1285|33.5932,-117.9455|33.8322,-117.3958

Mock notes

2013-04-15T13:01:56-07:00

Mocking different return values for multiple calls

Docs: http://www.voidspace.org.uk/python/mock/examples.html#multiple-calls-with-different-effects

import unittest

from mock import patch


def myfunc():
    pass


class MyTestCase(unittest.TestCase):

    @staticmethod
    def multiple_return_values(*return_values):
        return_values = list(return_values)

        def inner(*args, **kwargs):
            return return_values.pop(0)
        return inner

    @patch('%s.myfunc' % __name__)
    def test_something(self, myfunc_mock):
        myfunc_mock.side_effect = self.multiple_return_values(
            'return 1', 'return 2')

        r1 = myfunc()
        r2 = myfunc()
        self.assertEqual(r1, 'return 1')
        self.assertEqual(r2, 'return 2')

Do you have a lot of short, single-use, private functions in your Python code?

2013-04-12T23:32:52-07:00

Do you have a lot of short, single-use, private functions in your Python code? For example, below is some stubbed out authentication code I've been working on. It checks if a user's password is correct and updates the hash algorithm to use bcrypt. The 4 private functions with the leading underscore are from 1 to 10 lines long and are only used by the check_password function. These functions are part of a larger module with about 20 functions. I don't like that these 4 functions add clutter to the module and are not grouped with the function that uses them, check_password.

def _get_password_hash_from_db(email_address):
    """Get the user's password hash from the database.
    """


def _determine_password_hash_algorithm(password_hash):
    """Determine the hash algorithm.
    """


def _hash_password_old(password):
    """This is the OLD password hash algorithm.
    """


def _hash_existing_password_bcrypt(password, db_password_hash):
    """This is the NEW algorithm used for hashing existing passwords.
    """


def check_password(email_address, password):
    """Check if a user's supplied password is correct.
    """
    db_password_hash = _get_password_hash_from_db(email_address)
    hash_alg = _determine_password_hash_algorithm(db_password_hash)
    if hash_alg == 'BCRYPT':
        input_password_hash = _hash_existing_password_bcrypt(password, db_password_hash)
    else:
        input_password_hash = _hash_password_old(password)
    password_correct = (input_password_hash == db_password_hash)
    if password_correct and hash_alg != 'BCRYPT':
        call_change_password(email_address, password)
    return password_correct


def call_change_password(email_address, new_password):
    """Change the user's password.
    """

Sometimes, in cases like this, I move the 4 private functions to be nested functions inside check_password. I like how the functions are grouped together and that the module is not littered with extraneous functions. However, the inner functions are not easily testable and I don't see many people doing this.

def check_password(email_address, password):
    """Check if a user's supplied password is correct.
    """

    def get_password_hash_from_db(email_address):
        """Get the user's password hash from the database.
        """

    def determine_password_hash_algorithm(password_hash):
        """Determine the hash algorithm.
        """

    def hash_password_old(password):
        """This is the OLD password hash algorithm.
        """

    def hash_existing_password_bcrypt(password, db_password_hash):
        """This is the NEW algorithm used for hashing existing passwords.
        """

    db_password_hash = get_password_hash_from_db(email_address)
    hash_alg = determine_password_hash_algorithm(db_password_hash)
    if hash_alg == 'BCRYPT':
        input_password_hash = hash_existing_password_bcrypt(password, db_password_hash)
    else:
        input_password_hash = hash_password_old(password)
    password_correct = (input_password_hash == db_password_hash)
    if password_correct and hash_alg != 'BCRYPT':
        call_change_password(email_address, password)
    return password_correct


def call_change_password(email_address, new_password):
    """Change the user's password.
    """

Another option is to create a PasswordChecker class instead. This seems the most powerful and now the private methods are testable. However, this adds more overhead and I hear Jack Diederich telling me to Stop Writing Classes!

class _PasswordChecker(object):
    """Check if a user's supplied password is correct.
    """

    @staticmethod
    def _get_password_hash_from_db(email_address):
        """Get the user's password hash from the database.
        """

    @staticmethod
    def _determine_password_hash_algorithm(password_hash):
        """Determine the hash algorithm.
        """

    @staticmethod
    def _hash_password_old(password):
        """This is the OLD password hash algorithm.
        """

    @staticmethod
    def _hash_existing_password_bcrypt(password, db_password_hash):
        """This is the NEW algorithm used for hashing existing passwords.
        """

    def __call__(self, email_address, password):
        db_password_hash = self._get_password_hash_from_db(email_address)
        hash_alg = self._determine_password_hash_algorithm(db_password_hash)
        if hash_alg == 'BCRYPT':
            input_password_hash = self._hash_existing_password_bcrypt(password, db_password_hash)
        else:
            input_password_hash = self._hash_password_old(password)
        password_correct = (input_password_hash == db_password_hash)
        if password_correct and hash_alg != 'BCRYPT':
            call_change_password(email_address, password)
        return password_correct


check_password = _PasswordChecker()


def call_change_password(email_address, new_password):
    """Change the user's password.
    """

Maybe the solution is to break up the module into smaller modules which act like the class above? However this might leave me with some unevenly sized modules. How do you handle this?

How to expose a Flask local development server to the public using SSH remote port forwarding

2013-02-12T18:50:36-08:00

Here is how to run a Flask local development server on your local machine and expose it to the public via a remote server you have control over. This uses SSH remote port forwarding which is a converse of local port forwarding described here: How to run a Django local development server on a remote machine and access it in your browser on your local machine using SSH port forwarding

On the remote host, edit the sshd_config file (mine was located at /etc/ssh/sshd_config) to allow remote hosts to connect to ports forwarded for the client:
```
GatewayPorts yes
```
On the remote host, restart the SSH server:
```
$ sudo service sshd restart 
```

On the local host, SSH to the remote host:

$ ssh -v -R 50051:localhost:5000 eliot@my.remotehost.com

On the local host, run the Flask dev server:
```
$ python runserver.py localhost 5000 
```
Go to http://my.remotehost.com:50051 in the browser

Using RemoteForward in your ~/.ssh/config

You can also achieve the same results by using the RemoteForward in your ~/.ssh/config file:

Host myremote
  User eliot
  HostName my.remotehost.com
  RemoteForward 50051 localhost:5000

References

When is the try-finally block used in Python?

2013-02-01T16:33:07-08:00

The finally block is used to define clean-up actions. Why is the finally block needed? Why can't the clean up actions be put after the try/except/else block? This works in some cases, but if there is a return, break, or continue, or an unhandled exception inside the try, except, or else clauses, that code will never be executed. The finally block executes even in these conditions.

try:
    print 'Inside try'
    raise Exception
finally:
    print 'Inside finally'
print 'Never get here'

Results:

Inside try
Inside finally
Traceback (most recent call last):
  File "tmp.py", line 13, in 
    raise Exception
Exception

Reference: http://docs.python.org/2/tutorial/errors.html#defining-clean-up-actions

Using Python's gzip and StringIO to compress data in memory

2012-11-07T22:08:28-08:00

I needed to gzip some data in memory that would eventually end up saved to disk as a .gz file. I thought, That's easy, just use Python's built in gzip module.

However, I needed to pass the data to pycurl as a file-like object. I didn't want to write the data to disk and then read it again just to pass to pycurl. I thought, That's easy also-- just use Python's cStringIO module.

The solution did end up being simple, but figuring out the solution was a lot harder than I thought. Below is my roundabout process of finding the simple solution.

Here is my setup/test code. I am running Python 2.7.3 on Ubuntu 12.04.

import cStringIO
import gzip


STUFF_TO_GZIP = """Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?"""
FILENAME = 'myfile.json.gz'


def pycurl_simulator(fileobj):

    # Get the file size
    fileobj.seek(0, 2)
    filesize = fileobj.tell()
    fileobj.seek(0, 0)

    # Read the file data
    fout = open(FILENAME, 'wb')
    fout.write(fileobj.read())
    fout.close()

    return filesize

Try 1: seek from the end fails

Here is my first attempt using cStringIO with the gzip module.

def try1_seek_from_end_fails():

    ftemp = cStringIO.StringIO()
    fgzipped = gzip.GzipFile(
        filename=FILENAME, mode='wb', fileobj=ftemp)
    fgzipped.write(STUFF_TO_GZIP)
    filesize = pycurl_simulator(fgzipped)
    print filesize

I got this exception:

        Traceback (most recent call last):
          File "tmp.py", line 232, in <module>
            try1_seek_from_end_fails()
          File "tmp.py", line 83, in try1_seek_from_end_fails
            filesize = pycurl_simulator(fgzipped)
          File "tmp.py", line 25, in pycurl_simulator
            fileobj.seek(0, 2)
          File "/usr/lib/python2.7/gzip.py", line 415, in seek
            raise ValueError('Seek from end not supported')
        ValueError: Seek from end not supported

It turns out the gzip object doesn't support seeking from the end. See this thread on the Python mailing list: http://mail.python.org/pipermail/python-list/2009-January/519398.html

Try 2: data is not compressed

What if we don't seek() from the end and just tell() where we are? (It should be at the end after doing a write(), right?) Unfortunately, this gave me the uncompressed size.

Reading from the GzipFile object also gave me an error saying that I couldn't read from a writable object.

def try2_data_is_not_compressed():

    ftemp = cStringIO.StringIO()
    fgzipped = gzip.GzipFile(
        filename=FILENAME, mode='wb', fileobj=ftemp)
    fgzipped.write(STUFF_TO_GZIP)
    filesize = fgzipped.tell()
    print filesize

Try 5: file much too small

I googled, then looked at the source code for gzip.py. I found that the compressed data was in the StringIO object. So I performed my file operations on it instead of the GzipFile object. Now I was able to write the data out to a file. However, the size of the file was much too small.

def try5_file_much_too_small():

    fgz = cStringIO.StringIO()
    gzip_obj = gzip.GzipFile(
        filename=FILENAME, mode='wb', fileobj=fgz)
    gzip_obj.write(STUFF_TO_GZIP)
    filesize = pycurl_simulator(fgz)
    print filesize

Try 6: unexpected end of file

I saw there was a flush() method in the source code. I added a call to flush(). This time, I got a reasonable file size, however, when trying to gunzip it from the command line, I got the following error:

        gzip: myfile.json.gz: unexpected end of file

def try6_unexpected_end_of_file():

    fgz = cStringIO.StringIO()
    gzip_obj = gzip.GzipFile(
        filename=FILENAME, mode='wb', fileobj=fgz)
    gzip_obj.write(STUFF_TO_GZIP)
    gzip_obj.flush()
    filesize = pycurl_simulator(fgz)
    print filesize

Try 7: got it working

I knew that GzipFile worked properly when writing files directly as opposed to reading from the StringIO object. It turns out the difference was that there was code in the close() method of GzipFile which wrote some extra required data. Now stuff was working.

def try7_got_it_working():

    fgz = cStringIO.StringIO()
    gzip_obj = gzip.GzipFile(
        filename=FILENAME, mode='wb', fileobj=fgz)
    gzip_obj.write(STUFF_TO_GZIP)
    gzip_obj.flush()

    # Do stuff that GzipFile.close() does
    gzip_obj.fileobj.write(gzip_obj.compress.flush())
    gzip.write32u(gzip_obj.fileobj, gzip_obj.crc)
    gzip.write32u(gzip_obj.fileobj, gzip_obj.size & 0xffffffffL)

    filesize = pycurl_simulator(fgz)
    print filesize

Try 8: (not really) final version

Here's the (not really) final version using a subclass of GzipFile that adds a method to write the extra data at the end. If also overrides close() so that stuff isn't written twice in case you need to use close(). Also, the separate flush() call is not needed.

def try8_not_really_final_version():

    class MemoryGzipFile(gzip.GzipFile):
        """
        A GzipFile subclass designed to be used with in memory file like
        objects, i.e. StringIO objects.
        """

        def write_crc_and_filesize(self):
            """
            Flush and write the CRC and filesize. Normally this is done
            in the close() method. However, for in memory file objects,
            doing this in close() is too late.
            """
            self.fileobj.write(self.compress.flush())
            gzip.write32u(self.fileobj, self.crc)
            # self.size may exceed 2GB, or even 4GB
            gzip.write32u(self.fileobj, self.size & 0xffffffffL)

        def close(self):
            if self.fileobj is None:
                return
            self.fileobj = None
            if self.myfileobj:
                self.myfileobj.close()
                self.myfileobj = None

    fgz = cStringIO.StringIO()
    gzip_obj = MemoryGzipFile(
        filename=FILENAME, mode='wb', fileobj=fgz)
    gzip_obj.write(STUFF_TO_GZIP)
    gzip_obj.write_crc_and_filesize()

    filesize = pycurl_simulator(fgz)
    print filesize

Try 9: didn't need to do that (final version)

It turns out I can close the GzipFile object and the StringIO object remains available. So that MemoryGzipFile class above is completely unnecessary. I am dumb. Here is the final iteration:

def try9_didnt_need_to_do_that():

    fgz = cStringIO.StringIO()
    gzip_obj = gzip.GzipFile(
        filename=FILENAME, mode='wb', fileobj=fgz)
    gzip_obj.write(STUFF_TO_GZIP)
    gzip_obj.close()

    filesize = pycurl_simulator(fgz)
    print filesize

References

Here is some googling I did:

How to use pip with crate.io

2012-10-24T10:44:56-07:00

Here's how to use pip with crate.io (in case pypi.python.org goes down):

$ pip install --index-url=https://simple.crate.io yolk

Or with logging to see what's happening:

$ pip install --log=my-pip-debug.log --index-url=https://simple.crate.io yolk

How to run a Django local development server on a remote machine and access it in your browser on your local machine using SSH port forwarding

2012-10-23T18:11:59-07:00

Here is how to run a Django local development server on a remote machine and access it in your browser on your local machine using SSH port forwarding. (This is useful if there is a firewall blocking access to the port of your Django local dev server (port 8000).

On the local host, SSH to the remote host:

$ ssh -v -L 9000:localhost:8000 eliot@my.remotehost.com

On the remote host, run the Django dev server:

eliot@my.remotehost.com:/path/to/my/django/project$ python manage.py runserver 0.0.0.0:8000

On the local host, go to http://localhost:9000 in the browser

Note: The local port and the remote port can be the same (i.e. you can use 8000 instead of 9000). I just made them different to show which port is which.

Using LocalForward in your ~/.ssh/config

You can also achieve the same results by using the LocalForward in your ~/.ssh/config file:

Host myremote
  User eliot
  HostName my.remotehost.com
  LocalForward 9000 localhost:8000

Reference

http://magazine.redhat.com/2007/11/06/ssh-port-forwarding/

Testing HTTPS w/ Flask's development server using stunnel on Ubuntu

2012-10-17T17:30:11-07:00

Our website is served over HTTPS. To more easily test certain issues (e.g. mixed mode content warnings, or Mapquest SSL tile servers), I wanted to access my Flask local development server over HTTPS. These two articles describe how to do this using stunnel: Testing HTTPS with Django's Development Server, Django Development Server with HTTPS. Using stunnel, you can hit pages on your Django/Flask local dev server over HTTPS instead of HTTP. Here is how I installed it on Ubuntu Precise 12.04:

Install SSL development files
```
$ sudo apt-get install libssl-dev 
```
Go to https://www.stunnel.org/downloads.html and download stunnel-4.54.tar.gz
Unpack, compile, install.
```
$ tar xvf stunnel-4.54.tar.gz 
$ cd stunnel-4.54 
$ ./configure --prefix=/home/saltycrane/lib/stunnel-4.54 
$ make 
$ make install 
```
NOTE: the make install step asked me a number of questions and created a certificate file at /home/saltycrane/lib/stunnel-4.54/etc/stunnel/stunnel.pem. Accept all the defaults for the certificate information (accurate certificate information isn't needed for this application).

Create a stunnel configuration file, /home/saltycrane/lib/stunnel-4.54/etc/stunnel/dev_https:

pid =
cert = /home/saltycrane/lib/stunnel-4.54/etc/stunnel/stunnel.pem
debug = 7
foreground = yes

[https]
accept = 7000
connect = 5000

Start stunnel:

$ /home/saltycrane/lib/stunnel-4.54/bin/stunnel /home/saltycrane/lib/stunnel-4.54/etc/stunnel/dev_https
2012.10.17 17:40:52 LOG7[12468:140357811214080]: Clients allowed=500
2012.10.17 17:40:52 LOG5[12468:140357811214080]: stunnel 4.54 on x86_64-unknown-linux-gnu platform
2012.10.17 17:40:52 LOG5[12468:140357811214080]: Compiled/running with OpenSSL 1.0.1 14 Mar 2012
2012.10.17 17:40:52 LOG5[12468:140357811214080]: Threading:PTHREAD SSL:+ENGINE+OCSP Auth:none Sockets:POLL+IPv6
2012.10.17 17:40:52 LOG5[12468:140357811214080]: Reading configuration from file /home/saltycrane/lib/stunnel-4.54/etc/stunnel/dev_https
2012.10.17 17:40:52 LOG7[12468:140357811214080]: Compression not enabled
2012.10.17 17:40:52 LOG7[12468:140357811214080]: Snagged 64 random bytes from /home/saltycrane/.rnd
2012.10.17 17:40:52 LOG7[12468:140357811214080]: Wrote 1024 new random bytes to /home/saltycrane/.rnd
2012.10.17 17:40:52 LOG7[12468:140357811214080]: PRNG seeded successfully
2012.10.17 17:40:52 LOG6[12468:140357811214080]: Initializing service [https]
2012.10.17 17:40:52 LOG7[12468:140357811214080]: Certificate: /home/saltycrane/lib/stunnel-4.54/etc/stunnel/stunnel.pem
2012.10.17 17:40:52 LOG7[12468:140357811214080]: Certificate loaded
2012.10.17 17:40:52 LOG7[12468:140357811214080]: Key file: /home/saltycrane/lib/stunnel-4.54/etc/stunnel/stunnel.pem
2012.10.17 17:40:52 LOG7[12468:140357811214080]: Private key loaded
2012.10.17 17:40:52 LOG7[12468:140357811214080]: Using DH parameters from /home/saltycrane/lib/stunnel-4.54/etc/stunnel/stunnel.pem
2012.10.17 17:40:52 LOG7[12468:140357811214080]: DH initialized with 1024-bit key
2012.10.17 17:40:52 LOG7[12468:140357811214080]: ECDH initialized with curve prime256v1
2012.10.17 17:40:52 LOG7[12468:140357811214080]: SSL options set: 0x00000004
2012.10.17 17:40:52 LOG5[12468:140357811214080]: Configuration successful
2012.10.17 17:40:52 LOG7[12468:140357811214080]: Service [https] (FD=7) bound to 0.0.0.0:7000
2012.10.17 17:40:52 LOG7[12468:140357811214080]: No pid file being created

Start the python dev server:

$ HTTPS=1 python bin/runserver.py 0.0.0.0 5000

Go to https://localhost:7000 in your browser

python enum types

2012-10-10T18:52:36-07:00

import operator


class EnumValue(object):

    def __init__(self, parent_name, name, value):
        self._parent_name = parent_name
        self._name = name
        self._value = value

    def _parents_equal(self, other):
        return (
            hasattr(other, '_parent_name')
            and self._parent_name == other._parent_name)

    def _check_parents_equal(self, other):
        if not self._parents_equal(other):
            raise TypeError(
                'This operation is valid only for enum values of the same type')

    def __eq__(self, other):
        return self._parents_equal(other) and self._value == other._value

    def __ne__(self, other):
        return not self.__eq__(other)

    def __lt__(self, other):
        self._check_parents_equal(other)
        return self._value < other._value

    def __le__(self, other):
        self._check_parents_equal(other)
        return self._value <= other._value

    def __gt__(self, other):
        self._check_parents_equal(other)
        return self._value > other._value

    def __ge__(self, other):
        self._check_parents_equal(other)
        return self._value >= other._value

    def __hash__(self):
        return hash(self._parent_name + str(self._value))

    def __repr__(self):
        return '{}({!r}, {!r}, {!r})'.format(
            self.__class__.__name__, self._parent_name, self._name, self._value)

    def __int__(self):
        return int(self._value)

    def __str__(self):
        return str(self._name)


class EnumMetaclass(type):

    def __new__(cls, name, bases, dct):
        uppercased = dict((k.upper(), v) for k, v in dct.items())
        new_dct = dict(
            name=name,
            _enums_by_str=dict(
                (k, EnumValue(name, k, v)) for k, v in uppercased.items()),
            _enums_by_int=dict(
                (v, EnumValue(name, k, v)) for k, v in uppercased.items()),
        )
        return super(EnumMetaclass, cls).__new__(cls, name, bases, new_dct)

    def __getattr__(cls, name):
        try:
            return cls.__getitem__(name)
        except KeyError:
            raise AttributeError

    def __getitem__(cls, name):
        try:
            name = name.upper()
        except AttributeError:
            pass
        try:
            return cls._enums_by_str[name]
        except KeyError:
            return cls._enums_by_int[name]

    def __repr__(cls):
        return '{}({!r}, {})'.format(
            cls.__class__.__name__,
            cls.name,
            ', '.join('{}={}'.format(v._name, v._value)
                      for v in sorted(cls._enums_by_str.values())))

    def values(cls):
        return sorted(cls._enums_by_str.values())

    def _values_comparison(cls, item, comparison_operator):
        """
        Return a list of values such that comparison_operator(value, item) is
        True.
        """
        return sorted(
            [v for v in cls._enums_by_str.values()
             if comparison_operator(v, item)])

    def values_lt(cls, item):
        return cls._values_comparison(item, operator.lt)

    def values_le(cls, item):
        return cls._values_comparison(item, operator.le)

    def values_gt(cls, item):
        return cls._values_comparison(item, operator.gt)

    def values_ge(cls, item):
        return cls._values_comparison(item, operator.ge)

    def values_ne(cls, item):
        return cls._values_comparison(item, operator.ne)


def enum_factory(name, **kwargs):
    return EnumMetaclass(name, (), kwargs)

Tests:

import unittest


class EnumTestCase(unittest.TestCase):
    def test_repr(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(
            repr(ProfileAction),
            "EnumMetaclass('ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)")

    def test_value_repr(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(
            repr(ProfileAction.VIEW), "EnumValue('ProfileAction', 'VIEW', 1)")

    def test_attribute_error(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        with self.assertRaises(AttributeError):
            ProfileAction.ASDFASDF

    def test_cast_to_str(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(str(ProfileAction.VIEW), 'VIEW')

    def test_cast_to_int(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(int(ProfileAction.VIEW), 1)

    def test_access_by_str(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(ProfileAction['VIEW'], ProfileAction.VIEW)

    def test_access_by_int(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(ProfileAction[1], ProfileAction.VIEW)

    def test_equality(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(ProfileAction.VIEW, ProfileAction.VIEW)
        self.assertEqual(ProfileAction['VIEW'], ProfileAction.VIEW)
        self.assertEqual(ProfileAction[1], ProfileAction.VIEW)

    def test_inequality(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertNotEqual(ProfileAction.VIEW, ProfileAction.EDIT_OWN)
        self.assertNotEqual(ProfileAction['VIEW'], ProfileAction.EDIT_OWN)
        self.assertNotEqual(ProfileAction[1], ProfileAction.EDIT_OWN)

        DashboardAction = enum_factory(
            'DashboardAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertNotEqual(ProfileAction.VIEW, DashboardAction.VIEW)

    def test_invalid_comparison(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        DashboardAction = enum_factory(
            'DashboardAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)

        with self.assertRaises(TypeError) as cm:
            ProfileAction.VIEW < DashboardAction.EDIT_OWN

        self.assertEqual(
            str(cm.exception),
            'This operation is valid only for enum values of the same type')

    def test_values(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(
            ProfileAction.values(), [
                EnumValue('ProfileAction', 'VIEW', 1),
                EnumValue('ProfileAction', 'EDIT_OWN', 2),
                EnumValue('ProfileAction', 'EDIT_PUBLIC', 3),
                EnumValue('ProfileAction', 'EDIT_FULL', 4),
            ])

    def test_values_lt(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(
            ProfileAction.values_lt(ProfileAction.EDIT_PUBLIC), [
                EnumValue('ProfileAction', 'VIEW', 1),
                EnumValue('ProfileAction', 'EDIT_OWN', 2),
            ])

    def test_values_le(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(
            ProfileAction.values_le(ProfileAction.EDIT_PUBLIC), [
                EnumValue('ProfileAction', 'VIEW', 1),
                EnumValue('ProfileAction', 'EDIT_OWN', 2),
                EnumValue('ProfileAction', 'EDIT_PUBLIC', 3),
            ])

    def test_values_gt(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(
            ProfileAction.values_gt(ProfileAction.EDIT_PUBLIC), [
                EnumValue('ProfileAction', 'EDIT_FULL', 4),
            ])

    def test_values_ge(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(
            ProfileAction.values_ge(ProfileAction.EDIT_PUBLIC), [
                EnumValue('ProfileAction', 'EDIT_PUBLIC', 3),
                EnumValue('ProfileAction', 'EDIT_FULL', 4),
            ])

    def test_values_ne(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        self.assertEqual(
            ProfileAction.values_ne(ProfileAction.EDIT_PUBLIC), [
                EnumValue('ProfileAction', 'VIEW', 1),
                EnumValue('ProfileAction', 'EDIT_OWN', 2),
                EnumValue('ProfileAction', 'EDIT_FULL', 4),
            ])

    def test_intersection_with_same_type(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        set_a = set([ProfileAction.VIEW, ProfileAction.EDIT_OWN])
        set_b = set([ProfileAction.VIEW, ProfileAction.EDIT_PUBLIC])
        self.assertEqual(set_a & set_b, set([ProfileAction.VIEW]))

    def test_intersection_with_different_types(self):
        ProfileAction = enum_factory(
            'ProfileAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)
        DashboardAction = enum_factory(
            'DashboardAction', VIEW=1, EDIT_OWN=2, EDIT_PUBLIC=3, EDIT_FULL=4)

        set_a = set([ProfileAction.VIEW, ProfileAction.EDIT_OWN])
        set_b = set([DashboardAction.VIEW, DashboardAction.EDIT_PUBLIC])
        self.assertEqual(set_a & set_b, set([]))

Python data object motivated by a desire for a mutable namedtuple with default values

2012-08-03T07:49:48-07:00

UPDATE 2016-08-12: Read Glyph's post and use the attrs library instead.

Reasons to use this instead of a namedtuple:

I want to change fields at a later time (mutability)
I want to specify a subset of the fields at instantiation and have the rest be set to a default value

Reasons to use this instead of a dict:

I want to explicitly name the fields in the object
I want to disallow setting fields that are not explicitly named*
I want to specify a subset of the fields at instantiation and have the rest be set to a default value
I want to use attribute style access (dot notation to access fields)

Reasons to use this instead of a regular Python class:

I don't want to duplicate field names in the __init__() method signature and when setting instance attributes of the same name.
I want to disallow setting fields that are not explicitly named*
I want to be able to easily convert the object to a dict or a tuple
I want to save memory

*Note: This Stack Overflow answer warns against using __slots__ for my goal of disallowing setting fields that are not explicitly named. It says metaclasses or decorators should be abused by us control freaks and static typing weenies instead. To comply with that advice, if you don't care about saving memory, __slots__ could be replaced with a non-special attribute, such as _fields. If that is done, attribute creation would no longer be limited.

Example POSTing binary data using pycurl

2012-08-01T15:51:43-07:00

Since urllib2 doesn't easily support POSTing binary data, I used pycurl. It's less fun to use. I think I remember reading that Requests supports this, but someone said something about speed a while back. It may warrant a second look. I couldn't figure out if httplib2 supports POSTing binary data.

Here's how I would do it with curl:

$ curl -v --data-binary @blank-contact-photo.jpg -H 'Content-Type: image/jpeg' 'http://localhost:8000' 
* About to connect() to localhost port 8000 (#0)
*   Trying 127.0.0.1... connected
> POST / HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: localhost:8000
> Accept: */*
> Content-Type: image/jpeg
> Content-Length: 1335
> Expect: 100-continue
> 
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server
* Closing connection #0

Here's the headers received by my simulated server:

['User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3\r\n',
 'Host: localhost:8000\r\n',
 'Accept: */*\r\n',
 'Content-Type: image/jpeg\r\n',
 'Content-Length: 1335\r\n',
 'Expect: 100-continue\r\n']

Here's what I tried. Method 4 is what worked.

import StringIO
import os.path
import pycurl

def main():
    """
    http://curl.haxx.se/libcurl/c/curl_easy_setopt.html
    http://code.activestate.com/recipes/576422-python-http-post-binary-file-upload-with-pycurl/
    http://pycurl.cvs.sourceforge.net/pycurl/pycurl/tests/test_post2.py?view=markup
    """
    method = 4
    filename = 'blank-contact-photo.jpg'
    url = 'http://localhost:8000'

    c = pycurl.Curl()
    c.setopt(pycurl.VERBOSE, 1)
    c.setopt(pycurl.URL, url)
    fout = StringIO.StringIO()
    c.setopt(pycurl.WRITEFUNCTION, fout.write)

    if method == 1:
        c.setopt(pycurl.HTTPPOST, [
                ("file1",
                 (c.FORM_FILE, filename))])
        c.setopt(pycurl.HTTPHEADER, ['Content-Type: image/jpeg'])
    elif method == 2:
        c.setopt(c.HTTPPOST, [
                ("uploadfieldname",
                 (c.FORM_FILE, filename,
                  c.FORM_CONTENTTYPE, "image/jpeg"))])
    elif method == 3:
        c.setopt(pycurl.UPLOAD, 1)
        c.setopt(pycurl.READFUNCTION, open(filename, 'rb').read)
        filesize = os.path.getsize(filename)
        c.setopt(pycurl.INFILESIZE, filesize)
    elif method == 4:
        c.setopt(pycurl.POST, 1)
        c.setopt(pycurl.HTTPHEADER, [
                'Content-Type: image/jpeg'])

        filesize = os.path.getsize(filename)
        c.setopt(pycurl.POSTFIELDSIZE, filesize)
        fin = open(filename, 'rb')
        c.setopt(pycurl.READFUNCTION, fin.read)

    c.perform()
    response_code = c.getinfo(pycurl.RESPONSE_CODE)
    response_data = fout.getvalue()
    print response_code
    print response_data
    c.close()


if __name__ == '__main__':
    main()

Here's the code for my simulated server in case you're curious:

import BaseHTTPServer
import SocketServer


class MyHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_POST(self):
        # from debugtools import pvar; pvar('self')
        # from debugtools import pvar; pvar('dir(self)')
        # from debugtools import pvar; pvar('vars(self)')
        # from debugtools import pvar; pvar('self.request')
        # from debugtools import pvar; pvar('self.rfile')
        # from debugtools import pvar; pvar('self.headers')
        # from debugtools import pvar; pvar('self.headers.headers')
        print self.headers.headers
        return 'asdfasdf'

PORT = 8000

Handler = MyHandler

SocketServer.TCPServer.allow_reuse_address = True
httpd = SocketServer.TCPServer(("", PORT), Handler)

print "serving at port", PORT
try:
    httpd.serve_forever()
except KeyboardInterrupt:
    httpd.shutdown()

How to prevent nose (unittest) from using the docstring when verbosity >= 2

2012-07-06T09:57:35-07:00

Some of our Python unit tests have docstrings. I find it annoying that, when using a verbosity level >= 2, nose prints the docstring instead of the class name and method name. Here's a hack to prevent it from doing that: Add a shortDescription() method to the test case class that returns None.

Here is an example of normal behavior:

import unittest

class MyTestCase(unittest.TestCase):
    def test_with_docstring(self):
        """Test that something does something
        """

    def test_without_docstring(self):
        pass

$ nosetests --verbosity=2 tmp.py
Test that something does something ... ok
test_without_docstring (tmp.MyTestCase) ... ok

Here is an example with the hack to prevent printing the docstring:

import unittest

class MyTestCase(unittest.TestCase):
    def shortDescription(self):
        return None

    def test_with_docstring(self):
        """Test that something does something
        """

    def test_without_docstring(self):
        pass

$ nosetests --verbosity=2 tmp.py
test_with_docstring (tmp.MyTestCase) ... ok
test_without_docstring (tmp.MyTestCase) ... ok

Test coverage with nose and coverage.py

2012-04-07T15:32:29-07:00

It's fun to use nose + coverage.py to show my progress as I write tests. Seeing the bar next to my code change from red to green makes me happy. 100% test coverage does not mean tests are complete. For example, a boolean OR'ed conditional expression may not test all conditions even though the line is marked as covered. Other limitations are discussed here: Flaws in coverage measurement. However, good test coverage is at least a step towards having a good test suite.

Install nose and coverage.py

Activate your virtualenv and pip install nose and coverage.

$ pip install nose 
$ pip install coverage

Run it

Here is the command line I use to run the tests. --with-coverage enables the nose-coverage plugin to check test coverage. --cover-erase erases coverage test results from a previous run. --cover-package specifies which Python package to analyze. Specifiy the package as you would using an import (e.g. dp.blueprints.info.views). If --cover-package is not specified, it will analyze everything. --cover-html enables pretty HTML coverage reports. This example is for the flask-encryptedsession tests.

$ nosetests --with-coverage --cover-erase --cover-package=flask_encryptedsession --cover-html
..........
Name                                      Stmts   Miss  Cover   Missing
-----------------------------------------------------------------------
flask_encryptedsession                        0      0   100%   
flask_encryptedsession.encryptedcookie       41      1    98%   176
flask_encryptedsession.encryptedsession      35      1    97%   75
-----------------------------------------------------------------------
TOTAL                                        76      2    97%   
----------------------------------------------------------------------
Ran 10 tests in 0.188s

OK

Display the HTML report

$ firefox cover/index.html

Get branch coverage

Branch coverage is useful for checking "if" statements without an explicit "else" in the code. ~~I had to install the development version of nose to use this feature:~~ As of version 1.2.0, this feature is available.

~~$ pip install https://github.com/nose-devs/nose/tarball/master~~

$ nosetests --cover-branches --with-coverage --cover-erase --cover-package=flask_encryptedsession --cover-html 
..........
Name                                      Stmts   Miss Branch BrPart  Cover   Missing
-------------------------------------------------------------------------------------
flask_encryptedsession                        0      0      0      0   100%   
flask_encryptedsession.encryptedcookie       41      1     12      1    96%   176
flask_encryptedsession.encryptedsession      35      1      4      1    95%   75
-------------------------------------------------------------------------------------
TOTAL                                        76      2     16      2    96%   
----------------------------------------------------------------------
Ran 10 tests in 0.234s

OK

Pycon 2012 talks that I saw that I enjoyed

2012-03-24T09:02:44-07:00

The Pycon 2012 videos are up at pyvideo.org. Here are some of the talks I enjoyed that I saw. I know I probably missed some great talks so I will try to watch more online. Let me know if there are some that I should not miss.

Favorite talk of the conference

Saturday Keynote: Let's Talk About pypy by David Beazley. It was awesome! See also his post Pycon blog post

Other great talks (in chronological order)

The Art of Subclassing by Raymond Hettinger. Great instruction on a topic I struggle with (object-oriented code).
Stop Writing Classes by Jack Diederich. This was the talk I anticipated most because I really enjoyed Jack Diederich's other talks and this was a topic I have struggled with. It did not have as much as I had hoped, but it was still good. See also: Hacker News thread
Through The Ether And Back Again: What Happens To A Packet When You Send It by Glyph Lefkowitz. I don't deal with much network code, but this was a great talk. I wonder why his Twisted talk was not approved...
Flexing SQLAlchemy's Relational Power by Brandon Rhodes. Great talk to help my weak SQL knowledge with a tool I will be using more.
Hand Coded Applications with SQLAlchemy by Michael Bayer. Ditto above-- great talk from SQLAlchemy's creator.
Python for data lovers: explore it, analyze it, map it by Jacqueline Kazil and Dana Bauer. Not exactly related to what I do, but it was a well-presented, interesting talk.
(watched online) Militarizing Your Backyard with Python: Computer Vision and the Squirrel Hordes by Kurt Grandis. Really cool talk. I wish I could be more like this guy. He got this fun side project running in a couple months. It would take me that long just to pick out a library and run Hello World. See also his post Pycon blog post

Setting the Expires header for S3 media using Python and boto

2012-02-11T00:12:44-08:00

Install boto

$ pip install boto 
$ pip freeze |grep boto 
boto==2.2.1

Script

This script sets the "Expires" header 25 years from the current date for all the files starting with the prefix "mydirectory". Replace the access key id, secret access key, and bucket.

import mimetypes
from datetime import datetime, timedelta

from boto.s3.connection import S3Connection


AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXXXX'
AWS_SECRET_ACCESS_KEY = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
BUCKET_NAME = 'mybucket'
PREFIX = 'mydirectory'


def main():
    conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
    bucket = conn.get_bucket(BUCKET_NAME)
    key_list = bucket.get_all_keys(prefix=PREFIX)
    for key in key_list:
        content_type, unused = mimetypes.guess_type(key.name)
        if not content_type:
            content_type = 'text/plain'
        expires = datetime.utcnow() + timedelta(days=(25 * 365))
        expires = expires.strftime("%a, %d %b %Y %H:%M:%S GMT")
        metadata = {'Expires': expires, 'Content-Type': content_type}
        print key.name, metadata
        key.copy(BUCKET_NAME, key, metadata=metadata, preserve_acl=True)


if __name__ == '__main__':
    main()

References

How to get the filename and it's parent directory in Python

2011-12-28T11:06:05-08:00

import os.path

def get_file_with_parents(filepath, levels=1):
    common = filepath
    for i in range(levels + 1):
        common = os.path.dirname(common)
    return os.path.relpath(filepath, common)

print get_file_with_parents(
    '/opt/xyzabc/etc/yaml/working/group1/vendor1.yaml', 1)
print get_file_with_parents(
    '/opt/xyzabc/etc/yaml/working/group1/vendor1.yaml', 2)

Results:

group1/vendor1.yaml
working/group1/vendor1.yaml

Creating a histogram plot with python

2011-12-08T10:35:43-08:00

from matplotlib import pyplot


filename = 'times_1201.txt'
lines = open(filename).readlines()
x = [int(line.strip()) for line in lines]
bins = [i * 1000 for i in range(10)]

pyplot.hist(x, bins=bins, facecolor='green', alpha=0.75)
pyplot.xlabel('Time (ms)')
pyplot.ylabel('Count')
pyplot.suptitle(r'Sup title')
pyplot.title(r'Title')
pyplot.grid(True)
pyplot.savefig(filename + '.png')

A unique Python redis-based queue with delay

2011-11-08T09:38:44-08:00

This is a simple Redis-based queue. Two features that I needed were uniqueness (i.e. if an item exists in the queue already, it won't be added again) and a delay, like beanstalkd, where an item must wait a specified time before it can be popped from the queue. There are a number of other Redis-based queues that have many more features but I didn't see one that had these two features together. This 50-line class works for my needs. It may or may not work for you. Feel free to copy this and build on it.

Note: I wrote this in May 2010. I ended up using this solution after trying out beanstalkd and Gearman.

Install

Install on Ubuntu 10.10 Maverick

Install the redis server
```
$ sudo apt-get install redis-server 
```
Install the python redis client
```
$ pip install redis 
```
Default conf file: /etc/redis/redis.conf
Default log file: /var/log/redis/redis-server.log
Default db dir: /var/lib/redis
Stop redis server: sudo /etc/init.d/redis-server stop
Start redis server: sudo /etc/init.d/redis-server start

Redis commands used

The queue is based on the redis sorted set data type and uses the following commands:

ZADD - Add members to a sorted set, or update its score if it already exists
ZRANGEBYSCORE - Return a range of members in a sorted set, by score
ZREM - Remove one or more members from a sorted set

Code

import time
import redis


REDIS_ADDRESS = '127.0.0.1'


class UniqueMessageQueueWithDelay(object):
    """A message queue based on the Redis sorted set data type. Duplicate items
    in the queue are not allowed. When a duplicate item is added to the queue,
    the new item is added, and the old duplicate item is removed. A delay may be
    specified when adding items to the queue. Items will only be popped after
    the delay has passed. Pop() is non-blocking, so polling must be used. The
    name of the queue == the Redis key for the sorted set.
    """
    def __init__(self, name):
        self.name = name
        self.redis = redis.Redis(REDIS_ADDRESS)

    def add(self, data, delay=0):
        """Add an item to the queue. delay is in seconds.
        """
        score = time.time() + delay
        self.redis.zadd(self.name, data, score)
        debug('Added %.1f, %s' % (score, data))

    def pop(self):
        """Pop one item from the front of the queue. Items are popped only if
        the delay specified in the add() has passed. Return False if no items
        are available.
        """
        min_score = 0
        max_score = time.time()
        result = self.redis.zrangebyscore(
            self.name, min_score, max_score, start=0, num=1, withscores=False)
        if result == None:
            return False
        if len(result) == 1:
            debug('Popped %s' % result[0])
            return result[0]
        else:
            return False

    def remove(self, data):
        return self.redis.zrem(self.name, data)


def debug(msg):
    print msg


def test_queue():
    u = UniqueMessageQueueWithDelay('myqueue')

    # add items to the queue
    for i in [0, 1, 2, 3, 4, 0, 1]:
        data = 'Item %d' % i
        delay = 5
        u.add(data, delay)
        time.sleep(0.1)

    # get items from the queue
    while True:
        print
        result = u.pop()
        print result
        if result != False:
            u.remove(result)
        time.sleep(1)


if __name__ == '__main__':
    test_queue()

Results:

Added 1320773851.8, Item 0
Added 1320773851.9, Item 1
Added 1320773852.0, Item 2
Added 1320773852.1, Item 3
Added 1320773852.2, Item 4
Added 1320773852.3, Item 0
Added 1320773852.4, Item 1

False

False

False

False

False

Popped Item 2
Item 2

Popped Item 3
Item 3

Popped Item 4
Item 4

Popped Item 0
Item 0

Popped Item 1
Item 1

False

False

False
^CTraceback (most recent call last):
  File "umqwdredisqueue.py", line 102, in 
    test_queue()
  File "umqwdredisqueue.py", line 98, in test_queue
    time.sleep(1)
KeyboardInterrupt

Some links related to Redis queues

How to get username, home directory, and hostname with Python

2011-11-05T13:29:05-07:00

Get username

Use getpass.getuser

import getpass
username = getpass.getuser()
print(username)

saltycrane

Get home directory

Use os.path.expanduser (h/t Bachsau) or os.environ

import os.path
homedir = os.path.expanduser("~")
print(homedir)

import os
homedir = os.environ['HOME']
print(homedir)

/home/saltycrane

Get hostname

Use socket.gethostname

import socket
hostname = socket.gethostname()
print(hostname)

scooby

Find all combinations of a set of lists with itertools.product

2011-11-01T09:37:32-07:00

Copied from http://stackoverflow.com/questions/2853212/all-possible-permutations-of-a-set-of-lists-in-python. Documentation: itertools.product

import itertools
from pprint import pprint

inputdata = [
    ['a', 'b', 'c'],
    ['d'],
    ['e', 'f'],
]
result = list(itertools.product(*inputdata))
pprint(result)

Results:

[('a', 'd', 'e'),
 ('a', 'd', 'f'),
 ('b', 'd', 'e'),
 ('b', 'd', 'f'),
 ('c', 'd', 'e'),
 ('c', 'd', 'f')]

Python gnupg (GPG) example

2011-10-28T09:21:26-07:00

python-gnupg is a Python package for encrypting and decrypting strings or files using GNU Privacy Guard (GnuPG or GPG). GPG is an open source alternative to Pretty Good Privacy (PGP). A popular use of GPG and PGP is encrypting email. For more information, see the python-gnupg documentation. Another option for encrypting data from Python is keyczar.

Install

This installs the Ubuntu GPG package, creates a test user, and installs the Python package, python-gnupg. This was installed on Ubuntu 10.10 Maverick Meerkat.

$ sudo apt-get install gnupg 
$ sudo adduser testgpguser 
$ sudo su testgpguser 
$ cd 
$ virtualenv --no-site-packages venv 
$ source venv/bin/activate 
$ pip install python-gnupg

Generate a key

This creates a GPG key. This also creates the gpghome directory if it does not exist. You may need to supply random hardware activity during the key generation. See the docs for more information. To generate random numbers, you can also install the rng-tools package.

$ sudo apt-get install rng-tools

import os
import gnupg

os.system('rm -rf /home/testgpguser/gpghome')
gpg = gnupg.GPG(gnupghome='/home/testgpguser/gpghome')
input_data = gpg.gen_key_input(
    name_email='testgpguser@mydomain.com',
    passphrase='my passphrase')
key = gpg.gen_key(input_data)
print key

B0F4CF530036CE8CD1C064F17D32CEE72C015CD5

Export keys

import gnupg

gpg = gnupg.GPG(gnupghome='/home/testgpguser/gpghome')
ascii_armored_public_keys = gpg.export_keys(key)
ascii_armored_private_keys = gpg.export_keys(key, True)
with open('mykeyfile.asc', 'w') as f:
    f.write(ascii_armored_public_keys)
    f.write(ascii_armored_private_keys)

(venv)testgpguser@mymachine:~$ cat mykeyfile.asc 
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1.4.10 (GNU/Linux)

mI0ETqrVGAEEAP42Xs1vQv40MxA3/g/Le5B0VatnDYaSvAhiYfaub79HY4mjYcCD
FPDo5b54PSzyhlVsz5RL46+RE9NpQ2JdvFofWi7eVzfdmmTtNYEaiUSmzLUq73Vz
qu7P1RhOfwuAyW0otnw/Lw54MVjVZblvp3ln1Fcpleb9ZSrY1h61Y8pHABEBAAG0
REF1dG9nZW5lcmF0ZWQgS2V5IChHZW5lcmF0ZWQgYnkgZ251cGcucHkpIDx0ZXN0
Z3BndXNlckBteWRvbWFpbi5jb20+iLgEEwECACIFAk6q1RgCGy8GCwkIBwMCBhUI
AgkKCwQWAgMBAh4BAheAAAoJEH0yzucsAVzVBjwD/1KgTx1y3cpuumu1HF0GtQV0
Wn7l9OaSj98CqQ/f2emHD1l9rrjdt9jm1g7wSsWumpKs57vxz7NXwHw7mI4qZ5m0
cvg/qRc/BBMP8v2WgzRsmls97Pplaate1k3QfvDCVs6F1qiIQyELffjxBHbmWPhx
XEwhnpLcvk2l7NbNnEwA
=exDD
-----END PGP PUBLIC KEY BLOCK-----
-----BEGIN PGP PRIVATE KEY BLOCK-----
Version: GnuPG v1.4.10 (GNU/Linux)

lQH+BE6q1RgBBAD+Nl7Nb0L+NDMQN/4Py3uQdFWrZw2GkrwIYmH2rm+/R2OJo2HA
gxTw6OW+eD0s8oZVbM+US+OvkRPTaUNiXbxaH1ou3lc33Zpk7TWBGolEpsy1Ku91
c6ruz9UYTn8LgMltKLZ8Py8OeDFY1WW5b6d5Z9RXKZXm/WUq2NYetWPKRwARAQAB
/gMDAq5W6uxeU2hDYDPZ1Yy+e97ppNXmdAeq1urZHmiPr4+a36nOWd6j0R/HBjG3
ELD8CqYiQ0vx8+F9rY/uwKga2bEkJsQXjvaaZtu97lzPyp2+avsaw2G+3jRAJWNL
5YG4c/XwK1cfEajM23f7zz/t6TRWG+Ve2Dzi7+obA0LuF8czSlpiTTEzLDk8QJCK
y2WmrZ+s+POWv3itVpI26o7PvTQESzwyKXdyCW2W66VnXTm4mQEL6kgyV0oO6xIl
QUVSn2XWvwFMg2iL+02zA467rsr1x6Nl8hEQJgFwJCejD2z+4C4yzEeQGFP9WUps
pbMedAjDHebhC9FzbW7yuQ3H7iTCK1mvidAFw2wTdrkH61ApzmSo/rSTSxXw7hLT
M/ONgYZtvr+CpJj+mIu1XvVDiftvMhXlwcvM8c9PB3zv+086K7kJDTnzPgYvL0H/
+V2b9X9BBfAax40MQuxZJWseaLtsxXyl/rhn8jSCFZoqtERBdXRvZ2VuZXJhdGVk
IEtleSAoR2VuZXJhdGVkIGJ5IGdudXBnLnB5KSA8dGVzdGdwZ3VzZXJAbXlkb21h
aW4uY29tPoi4BBMBAgAiBQJOqtUYAhsvBgsJCAcDAgYVCAIJCgsEFgIDAQIeAQIX
gAAKCRB9Ms7nLAFc1QY8A/9SoE8dct3KbrprtRxdBrUFdFp+5fTmko/fAqkP39np
hw9Zfa643bfY5tYO8ErFrpqSrOe78c+zV8B8O5iOKmeZtHL4P6kXPwQTD/L9loM0
bJpbPez6ZWmrXtZN0H7wwlbOhdaoiEMhC3348QR25lj4cVxMIZ6S3L5NpezWzZxM
AA==
=v9Z7
-----END PGP PRIVATE KEY BLOCK-----

Import keys

import gnupg
from pprint import pprint

gpg = gnupg.GPG(gnupghome='/home/testgpguser/gpghome')
key_data = open('mykeyfile.asc').read()
import_result = gpg.import_keys(key_data)
pprint(import_result.results)

[{'fingerprint': u'B0F4CF530036CE8CD1C064F17D32CEE72C015CD5',
  'ok': u'0',
  'text': 'Not actually changed\n'},
 {'fingerprint': u'B0F4CF530036CE8CD1C064F17D32CEE72C015CD5',
  'ok': u'16',
  'text': 'Contains private key\nNot actually changed\n'}]

List keys

import gnupg
from pprint import pprint

gpg = gnupg.GPG(gnupghome='/home/testgpguser/gpghome')
public_keys = gpg.list_keys()
private_keys = gpg.list_keys(True)
print 'public keys:'
pprint(public_keys)
print 'private keys:'
pprint(private_keys)

public keys:
[{'algo': u'1',
  'date': u'1319818520',
  'dummy': u'',
  'expires': u'',
  'fingerprint': u'B0F4CF530036CE8CD1C064F17D32CEE72C015CD5',
  'keyid': u'7D32CEE72C015CD5',
  'length': u'1024',
  'ownertrust': u'u',
  'trust': u'u',
  'type': u'pub',
  'uids': [u'Autogenerated Key (Generated by gnupg.py) ']}]
private keys:
[{'algo': u'1',
  'date': u'1319818520',
  'dummy': u'',
  'expires': u'',
  'fingerprint': u'B0F4CF530036CE8CD1C064F17D32CEE72C015CD5',
  'keyid': u'7D32CEE72C015CD5',
  'length': u'1024',
  'ownertrust': u'',
  'trust': u'',
  'type': u'sec',
  'uids': [u'Autogenerated Key (Generated by gnupg.py) ']}]

Encrypt a string

import gnupg

gpg = gnupg.GPG(gnupghome='/home/testgpguser/gpghome')
unencrypted_string = 'Who are you? How did you get in my house?'
encrypted_data = gpg.encrypt(unencrypted_string, 'testgpguser@mydomain.com')
encrypted_string = str(encrypted_data)
print 'ok: ', encrypted_data.ok
print 'status: ', encrypted_data.status
print 'stderr: ', encrypted_data.stderr
print 'unencrypted_string: ', unencrypted_string
print 'encrypted_string: ', encrypted_string

ok:  True
status:  encryption ok
stderr:  [GNUPG:] BEGIN_ENCRYPTION 2 9
[GNUPG:] END_ENCRYPTION

unencrypted_string:  Who are you? How did you get in my house?
encrypted_string:  -----BEGIN PGP MESSAGE-----
Version: GnuPG v1.4.10 (GNU/Linux)

hIwDFuhrAS77HYIBBACXqZ66rkGQv8yE61JddEmad3fUNvbfkhBPUI9OSaMO3PbN
Q/6SIDyi3FmhbM9icOBS7q3xddQpvFhwmrq9e3VLKnV3NSmWo+xJWosQ/GNAA/Hb
cwF1pOtR6bRHFBkqtmpTYnBo9rMpokW8lp4WxFxMda+af8TlId8HC0WcRUg4kNJi
AdV1fsd+sD/cGIp0cAltpaVuO4/uwV9lKd39VER6WigLDaeFUHjWhJbcHwTaJYHj
qmy5LRciNSjwsqeMK4zOFZyRPUqPVKwWLiE9kImMni0Nj/K54ElWujgTttZIlBqV
5+c=
=SM4r
-----END PGP MESSAGE-----

Decrypt a string

import gnupg

gpg = gnupg.GPG(gnupghome='/home/testgpguser/gpghome')
unencrypted_string = 'Who are you? How did you get in my house?'
encrypted_data = gpg.encrypt(unencrypted_string, 'testgpguser@mydomain.com')
encrypted_string = str(encrypted_data)
decrypted_data = gpg.decrypt(encrypted_string, passphrase='my passphrase')

print 'ok: ', decrypted_data.ok
print 'status: ', decrypted_data.status
print 'stderr: ', decrypted_data.stderr
print 'decrypted string: ', decrypted_data.data

ok:  True
status:  decryption ok
stderr:  [GNUPG:] ENC_TO 16E86B012EFB1D82 1 0
[GNUPG:] USERID_HINT 16E86B012EFB1D82 Autogenerated Key (Generated by gnupg.py) 
[GNUPG:] NEED_PASSPHRASE 16E86B012EFB1D82 16E86B012EFB1D82 1 0
[GNUPG:] GOOD_PASSPHRASE
gpg: encrypted with 1024-bit RSA key, ID 2EFB1D82, created 2011-11-02
      "Autogenerated Key (Generated by gnupg.py) "
[GNUPG:] BEGIN_DECRYPTION
[GNUPG:] PLAINTEXT 62 1320545729 
[GNUPG:] PLAINTEXT_LENGTH 41
[GNUPG:] DECRYPTION_OKAY
[GNUPG:] GOODMDC
[GNUPG:] END_DECRYPTION

decrypted string:  Who are you? How did you get in my house?

Encrypt a file

import gnupg

gpg = gnupg.GPG(gnupghome='/home/testgpguser/gpghome')
open('my-unencrypted.txt', 'w').write('You need to Google Venn diagram.')
with open('my-unencrypted.txt', 'rb') as f:
    status = gpg.encrypt_file(
        f, recipients=['testgpguser@mydomain.com'],
        output='my-encrypted.txt.gpg')

print 'ok: ', status.ok
print 'status: ', status.status
print 'stderr: ', status.stderr

ok:  True
status:  encryption ok
stderr:  [GNUPG:] BEGIN_ENCRYPTION 2 9
[GNUPG:] END_ENCRYPTION

(venv)testgpguser@mymachine:~$  cat my-encrypted.txt.gpg 
-----BEGIN PGP MESSAGE-----
Version: GnuPG v1.4.10 (GNU/Linux)

hIwDfTLO5ywBXNUBBADo7trFZUD6Ir1vPRAJsoQXDiiw32N1m9/PXWCnQqX0nyzW
LfluNMfLFQRclNPVEg+o91qhS71apKvagp8DW7SCDE2SdCYk8nAS3bwAg5+GUyDs
XY2E6BQ1cLA1eK1V6D15ih6cq0laRzWuFkehH9PQ5Yp4ZZOmCbopw7dufnYPjdJb
AVGLpZRq64SuN1BUWIHbO7vqQGFq7qhGQwuegblEMm4vyr6FBW6JA/x4G/PMfImZ
1cH6KBrWGWrLCTiU/FKG9JvOm8mg8NXzd/TVjPs6rHRaKPFln37T7cLUwA==
=FSQP
-----END PGP MESSAGE-----

Decrypt a file

import gnupg

gpg = gnupg.GPG(gnupghome='/home/testgpguser/gpghome')
with open('my-encrypted.txt.gpg', 'rb') as f:
    status = gpg.decrypt_file(f, passphrase='my passphrase', output='my-decrypted.txt')

print 'ok: ', status.ok
print 'status: ', status.status
print 'stderr: ', status.stderr

ok:  True
status:  decryption ok
stderr:  [GNUPG:] ENC_TO 16E86B012EFB1D82 1 0
[GNUPG:] USERID_HINT 16E86B012EFB1D82 Autogenerated Key (Generated by gnupg.py) 
[GNUPG:] NEED_PASSPHRASE 16E86B012EFB1D82 16E86B012EFB1D82 1 0
[GNUPG:] GOOD_PASSPHRASE
gpg: encrypted with 1024-bit RSA key, ID 2EFB1D82, created 2011-11-02
      "Autogenerated Key (Generated by gnupg.py) "
[GNUPG:] BEGIN_DECRYPTION
[GNUPG:] PLAINTEXT 62 1320546031 
[GNUPG:] PLAINTEXT_LENGTH 32
[GNUPG:] DECRYPTION_OKAY
[GNUPG:] GOODMDC
[GNUPG:] END_DECRYPTION

(venv)testgpguser@mymachine:~$ cat my-decrypted.txt 
You need to Google Venn diagram.

Notes on using keyczar and python

2011-10-05T19:47:51-07:00

Install keyczar

Install pip and virtualenv

Install python-keyczar (and pycrypto and pyasn1):

$ virtualenv kzenv 
$ source kzenv/bin/activate 
$ pip install python-keyczar

Create the keyczar keys

$ mkdir -p /tmp/kz 
$ keyczart create --location=/tmp/kz --purpose=crypt 
$ keyczart addkey --location=/tmp/kz --status=primary

Example python script to encrypt and decrypt

Create my_kz_example.py:

from keyczar import keyczar

s = 'secret string'
location = '/tmp/kz'
crypter = keyczar.Crypter.Read(location)
s_encrypted = crypter.Encrypt(s)
s_decrypted = crypter.Decrypt(s_encrypted)
print s
print s_encrypted
print s_decrypted

Run it:

$ python my_kz_example.py

secret string
AKshuoLCgWOj0cqhgDKUkwClXl9BQ7IJ0zhG3V5Y7MR2xk6T4L3bCvhFQ-49Or8SKWs2JSMg1xnf
secret string

Obsolete instructions

Create command line tool

Create the keyczart script in your ~/bin directory:
$ mkdir ~/bin $ install -m 755 -o $(whoami) $(python -c 'import keyczar.keyczart as kz; print kz.__file__[:-1]') ~/bin/keyczart

Edit ~/bin/keyczart:

Change the first line from:
#!/usr/bin/python2.4
to:
#!/usr/bin/env python

Change the import statements from:
import errors import keyczar import keydata import keyinfo import readers import util
to:
import keyczar.errors as errors import keyczar.keyczar as keyczar import keyczar.keydata as keydata import keyczar.keyinfo as keyinfo import keyczar.readers as readers import keyczar.util as util

Some more python recursion examples

2011-10-05T19:35:31-07:00

I had a number of yaml files that contained passwords that needed encrypting. After parsing the yaml file with pyyaml, the data looked something like this:

EXAMPLE_DATA = {
    'jobs': [{'frequency': '* * * * *',
              'jobconfig': [{'config': [('*',
                                         {'maxspeed': 1048576,
                                          'password': 'onesecretpassword',
                                          'port': 22,
                                          'url': 'basset://basset1.domain.com/tootsiepop/123.csv',
                                          'username': 'myusername'})],
                             'hasbro': 'basset'},
                            {'config': [('*',
                                         {'field_delim': ',',
                                          'field_line': True,
                                          'no_blanks': True,
                                          'quote_char': '"'})],
                             'hasbro': 'pen'},
                            {'config': [('*',
                                         {'db_database': 'mydatabase',
                                          'db_host': 'myhost',
                                          'db_password': 'anothersecretpassword',
                                          'db_table': 'mytable',
                                          'db_user': 'myuser'})],
                             'hasbro': 'dart'}],
              'jobdesc': 'Data from tootsiepop',
              'jobname': 'tootsiepop',
              'max_records_fail': '110%',
              'min_failure_time': '1000y'}],
    'vendor': 'tootsiepop'}

Print all leaf nodes

Here is a recursive function that prints all the leaf nodes of my nested data structure.

def print_all_leaf_nodes(data):
    if isinstance(data, dict):
        for item in data.values():
            print_all_leaf_nodes(item)
    elif isinstance(data, list) or isinstance(data, tuple):
        for item in data:
            print_all_leaf_nodes(item)
    else:
        print data

print_all_leaf_nodes(EXAMPLE_DATA)

Results:

tootsiepop
1000y
tootsiepop
*
basset://basset1.domain.com/tootsiepop/123.csv
myusername
onesecretpassword
1048576
22
basset
*
True
"
,
True
pen
*
anothersecretpassword
mytable
myhost
mydatabase
myuser
dart
* * * * *
110%
Data from tootsiepop

Get all leaf nodes

This function returns all leaf nodes as a list instead of printing them. A wrapper function is used to create a Namespace instance to hold the results variable. This could alternatively be stored in a global (module-level) variable. See my notes on variable scope for more info about using a class as a namespace.

def get_all_leaf_nodes(data):
    class Namespace(object):
        pass
    ns = Namespace()
    ns.results = []

    def inner(data):
        if isinstance(data, dict):
            for item in data.values():
                inner(item)
        elif isinstance(data, list) or isinstance(data, tuple):
            for item in data:
                inner(item)
        else:
            ns.results.append(data)

    inner(data)
    return ns.results

from pprint import pprint
pprint(get_all_leaf_nodes(EXAMPLE_DATA))

Results:

['tootsiepop',
 '1000y',
 'tootsiepop',
 '*',
 'basset://basset1.domain.com/tootsiepop/123.csv',
 'myusername',
 'onesecretpassword',
 1048576,
 22,
 'basset',
 '*',
 True,
 '"',
 ',',
 True,
 'pen',
 '*',
 'anothersecretpassword',
 'mytable',
 'myhost',
 'mydatabase',
 'myuser',
 'dart',
 '* * * * *',
 '110%',
 'Data from tootsiepop']

Get all leaf key value pairs

This function gets all key value pairs where values are not compound data structures (i.e. dicts or lists)

def get_all_key_value_pairs_where_values_are_simple(data):
    class Namespace(object):
        pass
    ns = Namespace()
    ns.results = []

    def inner(data):
        if isinstance(data, dict):
            for k, v in data.iteritems():
                if (isinstance(v, dict) or
                    isinstance(v, list) or
                    isinstance(v, tuple)
                    ):
                    inner(v)
                else:
                    ns.results.append((k, v))
        elif isinstance(data, list) or isinstance(data, tuple):
            for item in data:
                inner(item)

    inner(data)
    return ns.results

from pprint import pprint
pprint(get_all_key_value_pairs_where_values_are_simple(EXAMPLE_DATA))

Results:

[('vendor', 'tootsiepop'),
 ('min_failure_time', '1000y'),
 ('jobname', 'tootsiepop'),
 ('url', 'basset://basset1.domain.com/tootsiepop/123.csv'),
 ('username', 'myusername'),
 ('password', 'onesecretpassword'),
 ('maxspeed', 1048576),
 ('port', 22),
 ('hasbro', 'basset'),
 ('field_line', True),
 ('quote_char', '"'),
 ('field_delim', ','),
 ('no_blanks', True),
 ('hasbro', 'pen'),
 ('db_password', 'anothersecretpassword'),
 ('db_table', 'mytable'),
 ('db_host', 'myhost'),
 ('db_database', 'mydatabase'),
 ('db_user', 'myuser'),
 ('hasbro', 'dart'),
 ('frequency', '* * * * *'),
 ('max_records_fail', '110%'),
 ('jobdesc', 'Data from tootsiepop')]

Modify values of terminal key value in a nested dict

This function modifies all the values of all dicts that are not compound data structures (i.e. dicts or lists). The modfn argument is a function that modifies the key value pair. It should accept two arguments: a key and value and it should return the modified value.

The example function, super_secure_encrypt is a function that checks if the string 'password' is in the key, and "encrypts" the value using the <sarcasm>super secure</sarcasm> ROT13 algorithm. (We are actually using the keyczar toolkit from google to do the encryption.)

def modify_all_simple_dict_values(data, modfn):
    if isinstance(data, dict):
        for k, v in data.iteritems():
            if (isinstance(v, dict) or
                isinstance(v, list) or
                isinstance(v, tuple)
                ):
                modify_all_simple_dict_values(v, modfn)
            else:
                data[k] = modfn(k, v)
    elif isinstance(data, list) or isinstance(data, tuple):
        for item in data:
            modify_all_simple_dict_values(item, modfn)

    return data


def super_secure_encrypt(key, value):
    if 'password' in key:
        value = value.encode('rot13')
    return value


from pprint import pprint
pprint(modify_all_simple_dict_values(EXAMPLE_DATA, super_secure_encrypt))

Results:

{'jobs': [{'frequency': '* * * * *',
           'jobconfig': [{'config': [('*',
                                      {'maxspeed': 1048576,
                                       'password': 'barfrpergcnffjbeq',
                                       'port': 22,
                                       'url': 'basset://basset1.domain.com/tootsiepop/123.csv',
                                       'username': 'myusername'})],
                          'hasbro': 'basset'},
                         {'config': [('*',
                                      {'field_delim': ',',
                                       'field_line': True,
                                       'no_blanks': True,
                                       'quote_char': '"'})],
                          'hasbro': 'pen'},
                         {'config': [('*',
                                      {'db_database': 'mydatabase',
                                       'db_host': 'myhost',
                                       'db_password': 'nabgurefrpergcnffjbeq',
                                       'db_table': 'mytable',
                                       'db_user': 'myuser'})],
                          'hasbro': 'dart'}],
           'jobdesc': 'Data from tootsiepop',
           'jobname': 'tootsiepop',
           'max_records_fail': '110%',
           'min_failure_time': '1000y'}],
 'vendor': 'tootsiepop'}

How to remove ^M characters from a file with Python

2011-10-03T11:26:03-07:00

Use the following Python script to remove ^M (carriage return) characters from your file and replace them with newline characters only. To do this in Emacs, see my notes here.

remove_ctrl_m_chars.py:

import os
import sys
import tempfile


def main():
    filename = sys.argv[1]
    with tempfile.NamedTemporaryFile(delete=False) as fh:
        for line in open(filename):
            line = line.rstrip()
            fh.write(line + '\n')
        os.rename(filename, filename + '.bak')
        os.rename(fh.name, filename)


if __name__ == '__main__':
    main()

Run it

$ python remove_ctrl_m_chars.py myfile.txt

Documentation

Notes on sqlalchemy w/ pyodbc, freetds on Ubuntu

2011-09-28T19:03:19-07:00

These are my notes for connecting to a MS SQL Server using Python on Linux. My developement environment is Ubuntu 10.10 Maverick (64-bit). The production environment is Redhat/Centos-based Scientific Linux 6. Here is the layers of stuff:

FreeTDS: an open source ODBC driver. Started with version 0.82 and moved to 0.91. Other options: Microsoft ODBC driver for Linux Free, currently only for 64-bit Redhat. Easysoft ODBC driver Commercial, not free.
unixODBC: driver manager, a thin wrapper around the ODBC driver. Version 2.2 (may move to version 2.3 if we use the Microsoft driver).
pyodbc: python library. Started with 2.1.11, currently using 3.0.5. I chose pyodbc over pymssql because, after minimal research, it seemed to be the preferred choice. Other options: pymssql (GPL, doesn't use ODBC), mxODBC (commercial, much better documentation, may be more robust)
SQLAlchemy: wraps pyodbc, provides an ORM and better interface. I chose to use sqlalchemy because it supported named parameters. pyodbc only supported the 'qmark' paramstyle and pymssql returns 'pyformat' as the paramstyle, but it seems to actually be 'format'. See PEP-0249 for more information on paramstyle. See http://www.sqlalchemy.org/docs/core/expression_api.html#sqlalchemy.sql.expression.text for more information on using named parameters.

Install (on Ubuntu)
Try to connect using tsql (FreeTDS)
Try to connect using Python
Error inserting unicode
Pyodbc SELECT -1 bug
How to list tables in the database with sqlalchemy
pyodbc/FreeTDS truncates a varchar(max) upon insert instead of raising an error (Install FreeTDS 0.91 from source)
Segfault error when using Free TDS 0.91 and pyodbc 2.1.11
What is the MS SQL server equivalent of MySQL's LIMIT in a SELECT
How to get the versions of stuff
Install the Microsoft ODBC driver for Linux
Install mxodbc
unixODBC GUI tools
Install SQL Developer on Ubuntu
Calling sprocs from sqlalchemy and pyodbc
Other misc errors
Install pyodbc on Scientific Linux 6
See also
How to find the location of the odbcinst.ini file

Install (on Ubuntu)

Set up odbc, freetds. Notes thanks to http://www.pauldeden.com/2008/12/how-to-setup-pyodbc-to-connect-to-mssql.html

$ sudo apt-get install unixodbc unixodbc-dev freetds-dev tdsodbc

Edit /etc/odbcinst.ini:

[FreeTDS]
Description = TDS driver (Sybase/MS SQL)
Driver = /usr/lib/odbc/libtdsodbc.so
Setup = /usr/lib/odbc/libtdsS.so
CPTimeout =
CPReuse =
TDS Version = 7.2
client charset = utf-8

UPDATE 2012-07-09: On Ubuntu 12.04 (64-bit), the libraries are in a different location. Also, Ubuntu 12.04 includes FreeTDS version 0.91.

[FreeTDS]
Description = TDS driver (Sybase/MS SQL)
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
CPTimeout =
CPReuse =
TDS Version = 7.2
client charset = utf-8

Install pyodbc

$ pip install pyodbc

Install sqlalchemy

$ pip install SQLAlchemy

This installed the following versions:

$ apt-cache show tdsodbc | grep Version: 
Version: 0.82-7

$ odbcinst --version 
unixODBC 2.2.14

$ pip freeze | grep pyodbc 
pyodbc==2.1.11

$ pip freeze | grep Alchemy
SQLAlchemy==0.7.3

Try to connect using tsql (FreeTDS)

Run the following command to ensure you can connect using tsql. For more information see: http://www.freetds.org/userguide/confirminstall.htm See also: FreeTDS User Guide: Chapter 8. Troubleshooting: Is the server there?

$ tsql -S 10.2.14.18 -U myusername -P mypassword
locale is "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=en_US.UTF-8;LC_TIME=en_US.UTF-8;LC_COLLATE=C;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8"
locale charset is "UTF-8"
using default charset "UTF-8"
1>

Try to connect using Python

pyodbc - this works

import pyodbc
conn = pyodbc.connect(
    'DRIVER={FreeTDS};SERVER=dev01.myserver.com;'
    'DATABASE=MyDatabase;UID=MyUsername;PWD=mypassword;port=1234;'
    'TDS_Version=8.0;')
print conn

sqlalchemy - this doesn't work

import sqlalchemy
engine = sqlalchemy.create_engine(
    'mssql+pyodbc://MyUsername:mypassword@dev01.myserver.com:1234/MyDatabase')
conn = engine.connect()
print conn

Traceback (most recent call last):
  File "debug_sqlalchemy.py", line 22, in 
    conn = engine.connect()
  File "/home/saltycrane/.virtualenvs/myproject/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 2310, in connect
    return self._connection_cls(self, **kwargs)
  File "/home/saltycrane/.virtualenvs/myproject/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 872, in __init__
    self.__connection = connection or engine.raw_connection()
  File "/home/saltycrane/.virtualenvs/myproject/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 2396, in raw_connection
    return self.pool.unique_connection()
  File "/home/saltycrane/.virtualenvs/myproject/lib/python2.6/site-packages/sqlalchemy/pool.py", line 169, in unique_connection
    return _ConnectionFairy(self).checkout()
  File "/home/saltycrane/.virtualenvs/myproject/lib/python2.6/site-packages/sqlalchemy/pool.py", line 370, in __init__
    rec = self._connection_record = pool._do_get()
  File "/home/saltycrane/.virtualenvs/myproject/lib/python2.6/site-packages/sqlalchemy/pool.py", line 696, in _do_get
    con = self._create_connection()
  File "/home/saltycrane/.virtualenvs/myproject/lib/python2.6/site-packages/sqlalchemy/pool.py", line 174, in _create_connection
    return _ConnectionRecord(self)
  File "/home/saltycrane/.virtualenvs/myproject/lib/python2.6/site-packages/sqlalchemy/pool.py", line 255, in __init__
    self.connection = self.__connect()
  File "/home/saltycrane/.virtualenvs/myproject/lib/python2.6/site-packages/sqlalchemy/pool.py", line 315, in __connect
    connection = self.__pool._creator()
  File "/home/saltycrane/.virtualenvs/myproject/lib/python2.6/site-packages/sqlalchemy/engine/strategies.py", line 80, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/saltycrane/.virtualenvs/myproject/lib/python2.6/site-packages/sqlalchemy/engine/default.py", line 275, in connect
    return self.dbapi.connect(*cargs, **cparams)
sqlalchemy.exc.DBAPIError: (Error) ('IM002', '[IM002] [unixODBC][Driver Manager]Data source name not found, and no default driver specified (0) (SQLDriverConnectW)') None None

sqlalchemy - this works

import sqlalchemy
import urllib
engine = sqlalchemy.create_engine(
    'mssql+pyodbc:///?odbc_connect=%s' % (
        urllib.quote_plus(
            'DRIVER={FreeTDS};SERVER=dev01.myserver.com;'
            'DATABASE=MyDatabase;UID=MyUsername;PWD=mypassword;port=1234;'
            'TDS_Version=8.0;')))
conn = engine.connect()
print conn

sqlalchemy - this works also

import pyodbc
import sqlalchemy
def connect():
    return pyodbc.connect(
        'DRIVER={FreeTDS};SERVER=dev01.myserver.com;'
        'DATABASE=MyDatabase;UID=MyUsername;PWD=mypassword;port=1234;'
        'TDS_Version=8.0;')
engine = sqlalchemy.create_engine('mssql://', creator=connect)
conn = engine.connect()
print conn

You can also connect by specifying the path to the driver directly and bypass /etc/odbcinst.ini. Not sure of the disadvantages of doing this.

SQL_SERVER_HOST = '10.5.18.11'
SQL_SERVER_UID = 'myusername'
SQL_SERVER_PWD = 'mypassword'
FREETDS_DRIVER = '/usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so'

before_quoted = ';'.join(
    [
        'DRIVER={}'.format(FREETDS_DRIVER),
        'SERVER={}'.format(SQL_SERVER_HOST),
        'DATABASE=mydatabase',
        'UID={}'.format(SQL_SERVER_UID),
        'PWD={}'.format(SQL_SERVER_PWD),
        'PORT=1433',
        'TDS_VERSION=7.2'
        ]
    )
print before_quoted + '\n'
connection_string = 'mssql+pyodbc:///?odbc_connect={}'.format(
    urllib.quote_plus(before_quoted))
print connection_string + '\n'
engine = sqlalchemy.create_engine(connection_string)
print engine

DRIVER=/usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so;SERVER=10.5.18.11;DATABASE=mydatabase;UID=myusername;PWD=mypassword;PORT=1433;TDS_VERSION=7.2

mssql+pyodbc:///?odbc_connect=DRIVER%3D%2Fusr%2Flib%2Fx86_64-linux-gnu%2Fodbc%2Flibtdsodbc.so%3BSERVER%3D10.5.18.11%3BDATABASE%3Dmydatabase%3BUID%3Dmyusername%3BPWD%3Dmypassword%3BPORT%3D1433%3BTDS_VERSION%3D7.2

Engine(mssql+pyodbc:///?odbc_connect=DRIVER=/usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so;SERVER=10.5.18.11;DATABASE=mydatabase;UID=myusername;PWD=mypassword;PORT=1433;TDS_VERSION=7.2)

References

Error inserting unicode

If you get this "Error: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect." error during insert, it may be due to unicode in your data.

Traceback (most recent call last):
  File "tmp.py", line 10, in 
    (unicode(u'some unicode data: \xa1').encode("utf-8"), '123', '123456'),
pyodbc.ProgrammingError: ('42000', '[42000] [FreeTDS][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 3: The parameter name is invalid. (8005) (SQLExecDirectW)')

The fix for me was to change the TDS version number. "7.0" and "8.0" did not work. "7.1", "7.2", "9.0", and some others worked as well. There is this FreeTDS user guide page on version numbers, and also this other user guide page, but I'm not sure I'm clear on it. The following works for me:

import pyodbc

conn = pyodbc.connect(
    'DRIVER={FreeTDS};SERVER=dev01.myserver.com;'
    'DATABASE=MyDatabase;UID=MyUsername;PWD=mypassword;port=1234;'
    'TDS_Version=7.2;')
cursor = conn.cursor()
cursor.execute(
    """exec MyStoredProc @Param1 = ?, @Param2 = ?, @Param3 = ?""",
    (unicode(u'some unicode data: \xa1').encode("utf-8"), '123', '123456'),
    )
conn.commit()

`Unrecognized server version info` warning

When I changed TDS_Version to 7.2, I got the following warning:

/myvirtualenv/lib/python2.6/site-packages/sqlalchemy/connectors/pyodbc.py:119: SAWarning: Unrecognized server version info '95.10.13055'.   Version specific behaviors may not function properly.   If using ODBC with FreeTDS, ensure server version 7.0 or 8.0, not 4.2, is configured in the FreeTDS configuration.

This bothered me for a while, but later, when I upgraded freetds to 0.91, this warning went away.

Select -1 bug

Lin alerted me to this bug in pyodbc 2.1.11: Issue 157: Negative integers are retrieved as positive integers Executing a SELECT -1 returns 2^32-1 (4294967295). Solution is to apply this patch or upgrade to 3.0.x of pyodbc. I had trouble installing some versions of pyodbc. This is what worked for me:

$ pip install --upgrade -e git+https://code.google.com/p/pyodbc#egg=pyodbc

This gave me:

$ pip freeze | grep pyodbc 
-e git+https://code.google.com/p/pyodbc@404a3ba414ab15863c34db9c49ff0a02caa63600#egg=pyodbc-dev

$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyodbc
>>> pyodbc.version
'3.0.4-beta01'

How to list the tables in the database

import urllib

import sqlalchemy


# Recommended to be a module-level variable
# http://www.sqlalchemy.org/docs/core/connections.html
engine = sqlalchemy.create_engine(
    'mssql+pyodbc:///?odbc_connect=%s' % (
        urllib.quote_plus(
            'DRIVER={FreeTDS};SERVER=myserver.mydomain.com;'
            'DATABASE=MyDatabase;UID=myusername;PWD=mypassword;port=12345;'
            'TDS_Version=7.2;')))


metadata = sqlalchemy.MetaData()
metadata.bind = engine
metadata.reflect()
for t in metadata.sorted_tables:
    print t.name

pyodbc/FreeTDS truncates a varchar(max) upon insert instead of raising an error

When trying to insert a string with more than 100 characters into a varchar(100) field, pyodbc/FreeTDS truncated the string instead of raising an error. This occurred because I was using FreeTDS version 0.82. I found this out from this FreeTDS mailing list thread. The solution is to upgrade to FreeTDS version 0.91. I could not find any APT packages for Ubuntu 10.10. Looks like there is a package for Ubuntu Precise (12.04). So I compiled it from source. This assumes you've installed unixodbc as described above: $ sudo apt-get install unixodbc unixodbc-dev

$ cd /tmp 
$ wget http://ibiblio.org/pub/Linux/ALPHA/freetds/stable/freetds-stable.tgz 
$ tar zxvf freetds-stable.tgz 
$ cd freetds-0.91 
$ ./configure
$ make 
$ sudo make install

Then edit odbcinst.ini to point at the new library /usr/local/lib/libtdsodbc.so:

[FreeTDS]
Description = TDS driver (Sybase/MS SQL)
Driver = /usr/local/lib/libtdsodbc.so
Setup = /usr/lib/odbc/libtdsS.so
CPTimeout =
CPReuse =
TDS Version = 7.2
client charset = utf-8

After upgrading, pyodbc now gives me the following error instead of failing silently.

pyodbc.DataError: ('22001', '[22001] [FreeTDS][SQL Server]String or binary data would be truncated. (8152) (SQLExecDirectW)')

References:

Segfault error when using FreeTDS 0.91 and pyodbc 2.1.11

Got this error trying to connect to a database on production (Scientific Linux 6)

*** glibc detected *** python: free(): invalid next size (fast): 0x00000000019ed440 ***

Solution was to upgrade pyodbc to 3.0.5 because there was a bug.

What is the MS SQL server equivalent of MySQL's LIMIT in a SELECT

TOP See http://www.w3schools.com/sql/sql_top.asp

SELECT TOP * FROM mytable

How to get the versions of stuff

unixODBC

$ odbcinst --version
unixODBC 2.3.0

$ odbc_config --version
2.3.0

freetds

$ tsql -C
Compile-time settings (established with the "configure" script)
                            Version: freetds v0.91
             freetds.conf directory: /etc
     MS db-lib source compatibility: yes
        Sybase binary compatibility: yes
                      Thread safety: yes
                      iconv library: yes
                        TDS version: 4.2
                              iODBC: no
                           unixodbc: yes
              SSPI "trusted" logins: no
                           Kerberos: no

pyodbc

$ python
Python 2.7.0+ (r27:82500, Sep 15 2010, 18:14:55) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyodbc
>>> pyodbc.version
'3.0.5'

Trying out the Microsoft ODBC driver for Linux

This was installed on Scientific Linux 6. Ubuntu is not supported.

Install make

$ yum install make

Followed instructions to install the unixODBC driver manager and the driver. It installed OK.

$ sudo su root
$ tar xvf sqlncli-11.0.1720.0.tar.gz 
$ cd sqlncli-11.0.1720.0
$ ./build_dm.sh --help
$ ./build_dm.sh --download-url=file://../unixODBC-2.3.0.tar.gz
$ cd /tmp/unixODBC.8922.6568.26033/unixODBC-2.3.0; make install

$ sudo su root
$ tar xvf sqlncli-11.0.1720.0.tar.gz 
$ cd sqlncli-11.0.1720.0
$ ./install.sh --help
$ cp /etc/odbcinst.ini /etc/odbcinst.ini.bak
$ ./install.sh verify
Microsoft SQL Server ODBC Driver V1.0 for Linux Installation Script
Copyright Microsoft Corp.

Starting install for Microsoft SQL Server ODBC Driver V1.0 for Linux

Checking for 64 bit Linux compatible OS ..................................... OK
Checking required libs are installed ........................................ OK
unixODBC utilities (odbc_config and odbcinst) installed ..................... OK
unixODBC Driver Manager version 2.3.0 or later installed .................... OK
unixODBC Driver Manager configuration correct .............................. OK*
Microsoft SQL Server ODBC Driver V1.0 for Linux already installed .... NOT FOUND

$ ./install.sh install
...
Enter YES to accept the license or anything else to terminate the installation:  YES

Checking for 64 bit Linux compatible OS ..................................... OK
Checking required libs are installed ........................................ OK
unixODBC utilities (odbc_config and odbcinst) installed ..................... OK
unixODBC Driver Manager version 2.3.0 or later installed .................... OK
unixODBC Driver Manager configuration correct .............................. OK*
Microsoft SQL Server ODBC Driver V1.0 for Linux already installed .... NOT FOUND
Microsoft SQL Server ODBC Driver V1.0 for Linux files copied ................ OK
Symbolic links for bcp and sqlcmd created ................................... OK
Microsoft SQL Server ODBC Driver V1.0 for Linux registered ........... INSTALLED

Install log created at /tmp/sqlncli.31091.8029.6563/install.log.

One or more steps may have an *. See README for more information regarding
these steps.

$ odbcinst -q -d -n "SQL Server Native Client 11.0"
[SQL Server Native Client 11.0]
Description=Microsoft SQL Server ODBC Driver V1.0 for Linux
Driver=/opt/microsoft/sqlncli/lib64/libsqlncli-11.0.so.1720.0
UsageCount=1

Connection string: Driver is "SQL Server Native Client 11.0". Port is specified with a comma after the host. Thanks to http://www.connectionstrings.com/sql-server-2008

        connection = pyodbc.connect(
            ';'.join([
                    'DRIVER={SQL Server Native Client 11.0}',
                    'SERVER=%s,%s' % (db_profile['host'], db_profile['port']),
                    'DATABASE=%s' % db_profile['name'],
                    'UID=%s' % db_profile['user'],
                    'PWD=%s' % db_profile['password'],
                    ]) + ';')

Ran my app and got the following error message:

Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib '/opt/microsoft/sqlncli/lib64/libsqlncli-11.0.so.1720.0' : file not found (0) (SQLDriverConnect)")

That's a dumb error message. /opt/microsoft/sqlncli/lib64/libsqlncli-11.0.so.1720.0 existed and it had the correct permissions. Turns out it was missing dependencies. Google help: http://mailman.unixodbc.org/pipermail/unixodbc-support/2011-July/002902.html and http://www.linuxquestions.org/questions/fedora-35/how-to-install-libcrypto-so-4-a-594511/ .

$ ldd /opt/microsoft/sqlncli/lib64/libsqlncli-11.0.so.1720.0
        linux-vdso.so.1 =>  (0x00007fff5258d000)
        libcrypto.so.6 => not found
        libdl.so.2 => /lib64/libdl.so.2 (0x00007feec91fd000)
        libodbc.so.1 => /usr/lib64/libodbc.so.1 (0x00007feec8f90000)
        librt.so.1 => /lib64/librt.so.1 (0x00007feec8d88000)
        libssl.so.6 => not found
        libuuid.so.1 => /lib64/libuuid.so.1 (0x00007feec8b83000)
        libodbcinst.so.1 => /usr/lib64/libodbcinst.so.1 (0x00007feec896c000)
        libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007feec868c000)
        libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007feec844b000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007feec8144000)
        libm.so.6 => /lib64/libm.so.6 (0x00007feec7ebf000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007feec7ca9000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007feec7a8d000)
        libc.so.6 => /lib64/libc.so.6 (0x00007feec76eb000)
        /lib64/ld-linux-x86-64.so.2 (0x00007feec97b0000)
        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007feec74bf000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007feec72bc000)
        libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007feec70b1000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007feec6eaf000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007feec6c95000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007feec6a75000)

libcrypto.so.6 and libssl.so.6 are missing.

$ yum install openssl098e
$ ldd /opt/microsoft/sqlncli/lib64/libsqlncli-11.0.so.1720.0                 
        linux-vdso.so.1 =>  (0x00007fff5c7fe000)
        libcrypto.so.6 => /usr/lib64/libcrypto.so.6 (0x00007f7ebcdd8000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f7ebcbd4000)
        libodbc.so.1 => /usr/lib64/libodbc.so.1 (0x00007f7ebc966000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f7ebc75e000)
        libssl.so.6 => /usr/lib64/libssl.so.6 (0x00007f7ebc50f000)
        libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f7ebc30a000)
        libodbcinst.so.1 => /usr/lib64/libodbcinst.so.1 (0x00007f7ebc0f3000)
        libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f7ebbe14000)
        libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f7ebbbd1000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f7ebb8ca000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f7ebb646000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f7ebb42f000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7ebb213000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f7ebae72000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f7ebac5c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f7ebd4e3000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f7ebaa59000)
        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f7eba82d000)
        libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f7eba621000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f7eba41f000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f7eba205000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f7eb9fe5000)

Good. It works.

Install mxodbc

$ python -c 'print("UCS%s"%len(u"x".encode("unicode-internal")))'
UCS4
$ pip install http://downloads.egenix.com/python/egenix-mx-base-3.2.3.linux-x86_64-py2.7_ucs4.prebuilt.zip 
$ pip install http://downloads.egenix.com/python/egenix-mxodbc-3.1.1.linux-x86_64-py2.7_ucs4.prebuilt.zip 
$ #copy license files to "site-packages" directory

unixODBC GUI tools

I didn't end up actually using this. I think this is useful if you need to set up DSNs (e.g. if you're using mxodbc).

$ sudo apt-get install unixodbc-bin

Description: Graphical tools for ODBC management and browsing This package contains three graphical applications for use with unixODBC, the Open DataBase Connectivity suite: ODBCConfig, a graphical configuration tool for managing database drivers and access to individual databases; DataManager, a simple browser and query tool for ODBC databases; and odbctest, a tool for testing the ODBC API itself.

Provides:

/usr/bin/DataManager
/usr/bin/DataManagerII
/usr/bin/ODBCConfig
/usr/bin/odbctest

Installing SQL Developer on Ubuntu

For a GUI interface to MS SQL Server, I like Oracle's SQL Developer better than Eclipse. Thanks Kris. (Too bad it doesn't execute stored procedures.)

 - $ sudo apt-get install openjdk-6-jdk
 - Create an Oracle account
 - Download "Oracle SQL Developer for other platforms" from http://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/index.html
 - $ unzip sqldeveloper-3.1.07.42-no-jre.zip
 - $ cd sqldeveloper
 - $ sh sqldeveloper.sh 
Type the full pathname of a J2SE installation (or Ctrl-C to quit), the path will be stored in ~/.sqldeveloper/jdk
usr/lib/jvm/java-6-openjdk

http://djiang.wordpress.com/2007/02/27/using-oracle-sqldeveloper-to-access-sqlserver/
Download jTDS from here: http://sourceforge.net/projects/jtds/
mkdir jtds-1.2.5
mv jtds-1.2.5-dist.zip jtds-1.2.5
cd jtds-1.2.5
unzip jtds-1.2.5-dist.zip
 - Start Oracle’s SQLDeveloper, Tools->Preferences->Database->Third Party JDBC Drivers. Click “Add Entry” and point to the jtds-1.2.jar
 - File -> New... -> Database Connection -> choose SQLServer tab, type in hostname, port, username and password. It appears that the initial connection name has to be the same as the database and you can click the “Retrieve database” button. Once you found the database, you can rename the connection.

Calling stored procs from sqlalchemy and pyodbc

connection_string = ';'.join([
        'DRIVER={FreeTDS}',
        'SERVER=myhost.com',
        'DATABASE=my_database',
        'UID=myusername',
        'PWD=mypassword',
        'port=1433',
        'TDS_Version=7.2',
        ]) + ';'
print connection_string


def use_pyodbc():
    import pyodbc

    connection = pyodbc.connect(connection_string)
    print connection

    sql = " ".join([
            "EXEC [dbo].my_sproc",
            "@my_param = ?",
            ])
    print sql
    cursor = connection.cursor()
    result = cursor.execute(sql, ('my_value',))
    for r in result.fetchall():
        print r
        print r.a_param
    connection.commit()


def use_sqlalchemy():
    import urllib
    import sqlalchemy

    engine = sqlalchemy.create_engine(
        'mssql+pyodbc:///?odbc_connect=%s' % (
            urllib.quote_plus(connection_string)))
    print engine
    sql = sqlalchemy.sql.text(" ".join([
            "EXEC [dbo].my_sproc",
            "@my_param = :my_param",
            ]))
    print sql
    result = engine.execute(sql, {'my_param': 'my_value'})
    for r in result.fetchall():
        for k in r.keys():
            try:
                print '%50s %s' % (k, getattr(r, k))
            except sqlalchemy.exc.InvalidRequestError as e:
                print e
        break

Install pyodbc on Scientific Linux 6

you will also need freetds installed

$ yum install unixODBC-devel 
$ yum install gcc-c++ 
$ pip install pyodbc

Other misc errors

ProgrammingError: (ProgrammingError) ('42000', '[42000] [FreeTDS][SQL Server]Statement(s) could not be prepared. (8180) (SQLExecDirectW)')

Problem was missing commas.

```
DBAPIError: (Error) ('IM002', '[IM002] [unixODBC][Driver Manager]Data source name not found, and no default driver specified (0) (SQLDriverConnect)') None None
```
- This occurred when I could not connect to the database server. To troubleshoot this, see: FreeTDS User Guide: Chapter 8. Troubleshooting: Is the server there?
- This also occurred when I did not configure /etc/odbcinst.ini correctly as described in the Install section

How to find the location of the odbcinst.ini file

$ odbcinst -j
unixODBC 2.3.0
DRIVERS............: /etc/odbcinst.ini
SYSTEM DATA SOURCES: /etc/odbc.ini
FILE DATA SOURCES..: /etc/ODBCDataSources
USER DATA SOURCES..: /home/eliot/.odbc.ini
SQLULEN Size.......: 8
SQLLEN Size........: 8
SQLSETPOSIROW Size.: 8

Trace odbc

http://www.easysoft.com/support/kb/kb00945.html Add the following to the odbcinst.ini file. Be sure to turn it off after debugging.

[ODBC]
Trace = yes
TraceFile = /tmp/my_odbc_tracfile.log

Remove leading and trailing whitespace from a csv file with Python

2011-08-12T12:03:55-07:00

I'm reading a csv file with the Python csv module and could not find a setting to remove trailing whitespace. I found this setting, Dialect.skipinitialspace, but it I think it only applies to leading whitespace. Here's a one-liner to delete leading and trailing whitespace that worked for me.

import csv


reader = csv.DictReader(
    open('myfile.csv'),
    fieldnames=('myfield1', 'myfield1', 'myfield3'),
)

# skip the header row
next(reader)

# remove leading and trailing whitespace from all values
reader = (
    dict((k, v.strip()) for k, v in row.items() if v) for row in reader)

# print results
for row in reader:
    print row

Example parsing XML with lxml.objectify

2011-07-19T19:21:30-07:00

Example run with lxml 2.3, Python 2.6.6 on Ubuntu 10.10

from lxml import objectify, etree

xml = '''

  success
  joe@email.com
  

'''

root = objectify.fromstring(xml)

print root.tag
print root.text
print root.attrib
# dataset
# None
# {}

print root.statusthing.tag
print root.statusthing.text
print root.statusthing.attrib
# statusthing
# success
# {}

for e in root.datathing:
    print e.tag
    print e.text
    print e.attrib
    print e.attrib['gabble']
# datathing
# joe@email.com
# {'gabble': 'sent'}
# sent
# datathing
# None
# {'gabble': 'not sent'}
# not sent

for e in root.getchildren():
    print e.tag
# statusthing
# datathing
# datathing

for e in root.iterchildren():
    print e.tag
# statusthing
# datathing
# datathing

# you cannot modify the text attribute of an element.
# instead just assign to the element itself.
try:
    root.statusthing.text = 'failure'
except:
    import traceback
    traceback.print_exc()
# Traceback (most recent call last):
#   File "lxml_ex.py", line 54, in 
#     root.statusthing.text = 'failure'
#   File "lxml.objectify.pyx", line 237, in lxml.objectify.ObjectifiedElement.__setattr__ (src/lxml/lxml.objectify.c:2980)
# TypeError: attribute 'text' of 'StringElement' objects is not writable

# modify element text and write it out as xml again
root.statusthing = 'failure'
xml_new = etree.tostring(root, pretty_print=True)
print xml_new
# 
#   failure
#   joe@email.com
#   
# 

# Use deannotate() to get rid of 'py:pytype' information
objectify.deannotate(root, cleanup_namespaces=True)
xml_new = etree.tostring(root, pretty_print=True)
print xml_new
# 
#   failure
#   joe@email.com
#   
# 

# Add a child element to the root
c = etree.Element("thisdoesntmatter")
c.tag = "thisdoesntmattereither"
c.text = "mytext"
c.attrib['myattr'] = 'myvalue'
root.newchild = c
objectify.deannotate(root, cleanup_namespaces=True)
xml_new = etree.tostring(root, pretty_print=True)
print xml_new
# 
#   failure
#   joe@email.com
#   
#   mytext
#

References:

Notes on tracing code execution in Django and Python

2011-06-25T07:29:00-07:00

The trace module causes Python to print lines of code as they are executed. I learned about trace via @brandon_rhodes's tweet.

Trace a Python program

python -m trace -t myprogram.py myargs

Trace with a Django development server

From my experience, trace doesn't work with Django's auto-reloader. Use --noreload option

python -m trace -t manage.py runserver --noreload

Tracing with more control

This article shows how to write custom functions that are passed to sys.settrace.

Django trace tool, `django-trace`

I wrote a Django management command that uses sys.settrace with other Django management commands. https://github.com/saltycrane/django-trace.

Install

$ pip install -e git://github.com/saltycrane/django-trace.git#egg=django_trace

Add 'django_trace' to INSTALLED_APPS in settings.py

Usage

$ python manage.py trace --help 
Usage: manage.py trace [options] [command]

Use sys.settrace to trace code

Options:
  -v VERBOSITY, --verbosity=VERBOSITY
                        Verbosity level; 0=minimal output, 1=normal output,
                        2=verbose output, 3=very verbose output
  --settings=SETTINGS   The Python path to a settings module, e.g.
                        "myproject.settings.main". If this isn't provided, the
                        DJANGO_SETTINGS_MODULE environment variable will be
                        used.
  --pythonpath=PYTHONPATH
                        A directory to add to the Python path, e.g.
                        "/home/djangoprojects/myproject".
  --traceback           Print traceback on exception
  --include-builtins    Include builtin functions (default=False)
  --include-stdlib      Include standard library modules (default=False)
  --module-only         Display module names only (not lines of code)
  --calls-only          Display function calls only (not lines of code)
  --good=GOOD           Comma separated list of exact module names to match
  --bad=BAD             Comma separated list of exact module names to exclude
                        (takes precedence over --good and --good-regex)
  --good-regex=GOOD_REGEX
                        Regular expression of module to match
  --bad-regex=BAD_REGEX
                        Regular expression of module to exclude (takes
                        precedence over --good and --good-regex)
  --good-preset=GOOD_PRESET
                        A key in the GOOD_PRESETS setting
  --bad-preset=BAD_PRESET
                        A key in the BAD_PRESETS setting
  --version             show program's version number and exit
  -h, --help            show this help message and exit

$ python manage.py trace runserver 
01->django.core.management:128:     try:
01->django.core.management:129:         app_name = get_commands()[name]
02-->django.core.management:95:     if _commands is None:
02-->django.core.management:114:     return _commands
01->django.core.management:130:         if isinstance(app_name, BaseCommand):
01->django.core.management:134:             klass = load_command_class(app_name, name)
02-->django.core.management:69:     module = import_module('%s.management.commands.%s' % (app_name, name))
03--->django.utils.importlib:26:     if name.startswith('.'):
03--->django.utils.importlib:35:     __import__(name)
04---->django.contrib.staticfiles.management.commands.runserver:1: from optparse import make_option
04---->django.contrib.staticfiles.management.commands.runserver:3: from django.conf import settings
04---->django.contrib.staticfiles.management.commands.runserver:4: from django.core.management.commands.runserver import BaseRunserverCommand
05----->django.core.management.commands.runserver:1: from optparse import make_option
05----->django.core.management.commands.runserver:2: import os
05----->django.core.management.commands.runserver:3: import re
05----->django.core.management.commands.runserver:4: import sys
05----->django.core.management.commands.runserver:5: import socket
05----->django.core.management.commands.runserver:7: from django.core.management.base import BaseCommand, CommandError
05----->django.core.management.commands.runserver:8: from django.core.servers.basehttp import AdminMediaHandler, run, WSGIServerException, get_internal_wsgi_application
06------>django.core.servers.basehttp:8: """
06------>django.core.servers.basehttp:10: import os
06------>django.core.servers.basehttp:11: import socket
06------>django.core.servers.basehttp:12: import sys
06------>django.core.servers.basehttp:13: import traceback
...

$ python manage.py trace --bad=django,SocketServer --calls-only runserver 
01->wsgiref:23: """
01->wsgiref.simple_server:11: """
02-->BaseHTTPServer:18: """
03--->BaseHTTPServer:102: class HTTPServer(SocketServer.TCPServer):
03--->BaseHTTPServer:114: class BaseHTTPRequestHandler(SocketServer.StreamRequestHandler):
02-->wsgiref.handlers:1: """Base classes for server/gateway implementations"""
03--->wsgiref.util:1: """Miscellaneous WSGI-related Utilities"""
04---->wsgiref.util:11: class FileWrapper:
03--->wsgiref.headers:6: """
04---->wsgiref.headers:42: class Headers:
03--->wsgiref.handlers:43: class BaseHandler:
03--->wsgiref.handlers:371: class SimpleHandler(BaseHandler):
03--->wsgiref.handlers:412: class BaseCGIHandler(SimpleHandler):
03--->wsgiref.handlers:453: class CGIHandler(BaseCGIHandler):
02-->wsgiref.simple_server:26: class ServerHandler(SimpleHandler):
02-->wsgiref.simple_server:42: class WSGIServer(HTTPServer):
02-->wsgiref.simple_server:83: class WSGIRequestHandler(BaseHTTPRequestHandler):
01->contextlib:53: def contextmanager(func):
01->contextlib:53: def contextmanager(func):
Validating models...

0 errors found
Django version 1.4, using settings 'myproj.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
01->myproj.wsgi:15: """
01->wsgiref.simple_server:48:     def server_bind(self):
02-->BaseHTTPServer:106:     def server_bind(self):
02-->wsgiref.simple_server:53:     def setup_environ(self):
01->wsgiref.simple_server:53:     def setup_environ(self):
01->wsgiref.simple_server:66:     def set_app(self,application)

Trace decorator

I also wrote a decorator that traces code execution of the function it is decorating: https://github.com/saltycrane/trace-tools

Install

pip install -e git://github.com/saltycrane/trace-tools.git#egg=trace_tools

Usage

from trace_tools.decorators import trace

@trace()
def some_function_to_trace(arg):
    do_something()

@trace(max_level=2)
def some_function_to_trace(arg):
    do_something()

@another_decorator
@trace(
    max_level=4,
    ignore=(
        'httplib', 'logging', 'ssl', 'email', 'encodings', 'gzip', 'urllib',
        'multiprocessing', 'django', 'cgi', 'requests', 'cookielib', 'base64',
        'slumber', 'zipfile', 'redis'))
def some_other_function():
    do_something_else()

@trace(max_level=10, calls_only=False, ignore=('debugtools', 'blessings', 'ipdb', 'IPython',), ignore_builtins=True, ignore_stdlib=True)
def process(self, content):
    do_stuff()

@trace(max_level=115, calls_only=True, ignore=(
        'suds.resolver',
        'suds.sudsobject',
        'suds.xsd',
        'debugtools', 'blessings', 'ipdb', 'IPython',),
       ignore_builtins=True, ignore_stdlib=True)
def process(self, content):
    do_stuff()

(Not too successfully) trying to use Unix tools instead of Python utility scripts

2011-04-20T19:00:32-07:00

Inspired by articles such as Why you should learn just a little Awk and Learn one sed command, I am trying to make use of Unix tools sed, awk, grep, cut, uniq, sort, etc. instead of writing short Python utility scripts.

Here is a Python script I wrote this week. It greps a file for a given regular expression pattern and returns a unique, sorted, list of matches inside the capturing parentheses.

# grep2.py

import re
import sys


def main():
    patt = sys.argv[1]
    filename = sys.argv[2]

    text = open(filename).read()
    matchlist = set(m.group(1) for m in re.finditer(patt, text, re.MULTILINE))
    for m in sorted(matchlist):
        print m


if __name__ == '__main__':
    main()

As an example, I used my script to search one of the Django admin template files for all the Django template markup in the file.

$ python grep2.py '({{[^{}]+}}|{%[^{}]+%})' tabular.html

Output:

{% admin_media_prefix %}
{% blocktrans with inline_admin_formset.opts.verbose_name|title as verbose_name %}
{% cycle "row1" "row2" %}
{% else %}
{% endblocktrans %}
{% endfor %}
{% endif %}
{% endspaceless %}
{% for field in inline_admin_formset.fields %}
{% for field in line %}
{% for fieldset in inline_admin_form %}
{% for inline_admin_form in inline_admin_formset %}
{% for line in fieldset %}
{% if field.is_hidden %}
{% if field.is_readonly %}
{% if field.required %}
{% if forloop.first %}
{% if forloop.last %}
{% if inline_admin_form.form.non_field_errors %}
{% if inline_admin_form.has_auto_field %}
{% if inline_admin_form.original %}
{% if inline_admin_form.original or inline_admin_form.show_url %}
{% if inline_admin_form.show_url %}
{% if inline_admin_formset.formset.can_delete %}
{% if not field.widget.is_hidden %}
{% if not forloop.last %}
{% load i18n adminmedia admin_modify %}
{% spaceless %}
{% trans "Delete?" %}
{% trans "Remove" %}
{% trans "View on site" %}
{{ field.contents }}
{{ field.field }}
{{ field.field.errors.as_ul }}
{{ field.field.name }}
{{ field.label|capfirst }}
{{ forloop.counter0 }}
{{ inline_admin_form.deletion_field.field }}
{{ inline_admin_form.fk_field.field }}
{{ inline_admin_form.form.non_field_errors }}
{{ inline_admin_form.original }}
{{ inline_admin_form.original.id }}
{{ inline_admin_form.original_content_type_id }}
{{ inline_admin_form.pk_field.field }}
{{ inline_admin_formset.formset.management_form }}
{{ inline_admin_formset.formset.non_form_errors }}
{{ inline_admin_formset.formset.prefix }}
{{ inline_admin_formset.opts.verbose_name_plural|capfirst }}
{{ inline_admin_form|cell_count }}
{{ verbose_name }}

Here's my attempt at using Unix tools:

$ sed -rn 's/^.*(\{\{.*\}\}|\{%.*%\}).*$/\1/gp' tabular.html | sort | uniq

However the output isn't quite the same:

{% admin_media_prefix %}
{% else %}
{% endblocktrans %}
{% endfor %}
{% endif %}
{% endspaceless %}
{% for field in inline_admin_formset.fields %}
{% for field in line %}
{% for fieldset in inline_admin_form %}
{% for inline_admin_form in inline_admin_formset %}
{% for line in fieldset %}
{% if field.is_readonly %}
{% if inline_admin_form.form.non_field_errors %}
{% if inline_admin_form.original or inline_admin_form.show_url %}
{% if inline_admin_formset.formset.can_delete %}
{% if not field.widget.is_hidden %}
{% load i18n adminmedia admin_modify %}
{% spaceless %}
{% trans "Remove" %}
{{ field.contents }}
{{ field.field }}
{{ field.field.errors.as_ul }}
{{ field.field.name }}
{{ field.label|capfirst }}
{{ inline_admin_form.fk_field.field }}
{{ inline_admin_form.form.non_field_errors }}
{{ inline_admin_formset.formset.management_form }}
{{ inline_admin_formset.formset.non_form_errors }}
{{ inline_admin_formset.formset.prefix }}
{{ inline_admin_formset.opts.verbose_name_plural|capfirst }}

Unix tools are powerful and concise, but I still need to get a lot more comfortable with their syntax. Please leave a comment if you know how to fix my command.

How to use the bash shell with Python's subprocess module instead of /bin/sh

2011-04-13T23:25:37-07:00

By default, running subprocess.Popen with shell=True uses /bin/sh as the shell. If you want to change the shell to /bin/bash, set the executable keyword argument to /bin/bash.

Solution thanks this great article: Working with Python subprocess - Shells, Processes, Streams, Pipes, Redirects and More

import subprocess

def bash_command(cmd):
    subprocess.Popen(cmd, shell=True, executable='/bin/bash')

bash_command('a="Apples and oranges" && echo "${a/oranges/grapes}"')

Output:

Apples and grapes

For some reason, the above didn't work for my specific case, so I had to use the following instead:

import subprocess

def bash_command(cmd):
    subprocess.Popen(['/bin/bash', '-c', cmd])

Fabric post-run processing Python decorator

2010-11-06T22:00:52-07:00

import traceback
from functools import wraps

from fabric.api import env


# global variable for add_hooks()
parent_task_name = ''


def add_post_run_hook(hook, *args, **kwargs):
    '''Run hook after Fabric tasks have completed on all hosts

    Example usage:
        @add_post_run_hook(postrunfunc, 'arg1', 'arg2')
        def mytask():
            # ...

    '''
    def true_decorator(f):
        return add_hooks(post=hook, post_args=args, post_kwargs=kwargs)(f)
    return true_decorator


def add_hooks(pre=None, pre_args=(), pre_kwargs={},
              post=None, post_args=(), post_kwargs={}):
    '''
    Function decorator to be used with Fabric tasks.  Adds pre-run
    and/or post-run hooks to a Fabric task.  Uses env.all_hosts to
    determine when to run the post hook.  Uses the global variable,
    parent_task_name, to check if the task is a subtask (i.e. a
    decorated task called by another decorated task). If it is a
    subtask, do not perform pre or post processing.

    pre: callable to be run before starting Fabric tasks
    pre_args: a tuple of arguments to be passed to "pre"
    pre_kwargs: a dict of keyword arguments to be passed to "pre"
    post: callable to be run after Fabric tasks have completed on all hosts
    post_args: a tuple of arguments to be passed to "post"
    post_kwargs: a dict of keyword arguments to be passed to "post"

    '''

    # create a namespace to save state across hosts and tasks
    class NS(object):
        run_counter = 0

    def true_decorator(f):
        @wraps(f)
        def f_wrapper(*args, **kwargs):
            # set state variables
            global parent_task_name
            if not parent_task_name:
                parent_task_name = f.__name__
            NS.run_counter += 1
            print 'parent_task_name: %s' % parent_task_name
            print 'count/N_hosts: %d/%d' % (NS.run_counter, len(env.all_hosts))

            # pre-run processing
            if f.__name__ == parent_task_name and NS.run_counter == 1:
                if pre:
                    print 'Pre-run processing...'
                    pre(*pre_args, **pre_kwargs)

            # run the task
            r = None
            try:
                r = f(*args, **kwargs)
            except SystemExit:
                pass
            except:
                print traceback.format_exc()

            # post-run processing
            if (f.__name__ == parent_task_name and
                NS.run_counter >= len(env.all_hosts)):
                if post:
                    print 'Post-run processing...'
                    post(*post_args, **post_kwargs)

            return r

        return f_wrapper

    return true_decorator

How to get the date N days ago in Python

2010-10-16T00:04:22-07:00

from datetime import datetime, timedelta

N = 2

date_N_days_ago = datetime.now() - timedelta(days=N)

print(datetime.now())
print(date_N_days_ago)

Results:

2019-03-01 13:02:15.056303
2019-02-27 13:02:15.056285

For more information see the timedelta documentation

Class-based Fabric scripts via a Python metaprogramming hack

2010-09-23T23:43:03-07:00

This is a hack to enable the definition of Fabric tasks as methods in a class instead of just as module level functions. This class-based approach provides the benefits of inheritance and method overriding.

I have a history of using object-oriented techniques in places they weren't meant to be used. This one was not all my idea, so may Andrew get any blame he deserves. Here's the story:

We had several Fabric scripts which violated DRY. Andrew wished for a class-based Fabric script. We discussed ideas. Stackoverflow answered my questions. I hacked. Stackoverflow fixed it for me. I made one more tweak and here it is:

util.py:

import inspect
import sys

def add_class_methods_as_module_level_functions_for_fabric(instance, module_name):
    '''
    Utility to take the methods of the instance of a class, instance,
    and add them as functions to a module, module_name, so that Fabric
    can find and call them. Call this at the bottom of a module after
    the class definition.
    '''
    # get the module as an object
    module_obj = sys.modules[module_name]

    # Iterate over the methods of the class and dynamically create a function
    # for each method that calls the method and add it to the current module
    for method in inspect.getmembers(instance, predicate=inspect.ismethod):
        method_name, method_obj = method

        if not method_name.startswith('_'):
            # get the bound method
            func = getattr(instance, method_name)

            # add the function to the current module
            setattr(module_obj, method_name, func)

As the docstring says, this function takes the methods of a class instance and adds them as functions to the module (fabfile.py) so Fabric an find and call them. Here is an example.

base.py:

from fabric import api as fab

class Deployment(object):
    name = ''
    local_file = ''
    remote_file = ''

    def base_task1(self):
        'base task 1'
        fab.run('svn export /path/to/{self.name}'.format(self=self))

    def base_task2(self):
        'base task 2'
        fab.put(self.local_file, self.remote_file)

fabfile.py:

import base
import util
from fabric import api as fab

class _MyWebsiteDeployment(base.Deployment):
    name = 'my_website'
    local_file = '/local/path/to/my_website/file'
    remote_file = '/remote/path/to/my_website/file'

    def my_website_task(self):
        'my website task'
        fab.run('echo "I am special"')

instance = _MyWebsiteDeployment()
util.add_class_methods_as_module_level_functions_for_fabric(instance, __name__)

Running fab -l gives:

$ fab -l
Available commands:

    base_task1       base task 1
    base_task2       base task 2
    my_website_task  my website task

Twisted web POST example w/ JSON

2010-08-25T20:09:00-07:00

This is an example of a simple asynchronous Python web server using Twisted. This is a copy of Jp Calderone's Twisted Web in 60 seconds: handling POSTs example modified to accept a JSON payload in the POST request instead of form data. It also uses his Simple Python Web Server example to run the web server as a daemon with twistd.

`webserver.py`

"""
http://jcalderone.livejournal.com/49707.html
http://labs.twistedmatrix.com/2008/02/simple-python-web-server.html

usage:
        $ twistd -y webserver.py
"""


from pprint import pprint
from twisted.application.internet import TCPServer
from twisted.application.service import Application
from twisted.web.resource import Resource
from twisted.web.server import Site


class FormPage(Resource):
    def render_GET(self, request):
        return ''

    def render_POST(self, request):
        pprint(request.__dict__)
        newdata = request.content.getvalue()
        print newdata
        return ''


root = Resource()
root.putChild("form", FormPage())
application = Application("My Web Service")
TCPServer(8880, Site(root)).setServiceParent(application)

`test_post.py`

Here is a simple test client using httplib2 to send a POST request with some JSON data. I used Mark Pilgrim's Dive Into Python 3 Section 14.6 as a reference.

import httplib2
from datetime import datetime
import simplejson


TESTDATA = {'woggle': {'version': 1234,
                       'updated': str(datetime.now()),
                       }}
URL = 'http://localhost:8880/form'

jsondata = simplejson.dumps(TESTDATA)
h = httplib2.Http()
resp, content = h.request(URL,
                          'POST',
                          jsondata,
                          headers={'Content-Type': 'application/json'})
print resp
print content

Run the web server

$ twisted -y webserver.py

Run the test POST

$ python test_post.py

twistd.log

Here are the results stored in twistd.log.

2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1] {'_adapterCache': {},
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'args': {},
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'channel': <twisted.web.http.HTTPChannel instance at 0x7fb409dc8248>,
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'client': <twisted.internet.address.IPv4Address object at 0x1b48f50>,
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'clientproto': 'HTTP/1.1',
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'content': <cStringIO.StringO object at 0x1b4c068>,
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'cookies': [],
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'headers': {'date': 'Thu, 26 Aug 2010 03:02:37 GMT', 'content-type': 'text/html', 'server': 'TwistedWeb/10.0.0'},
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'host': <twisted.internet.address.IPv4Address object at 0x1b48fd0>,
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'method': 'POST',
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'notifications': [],
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'path': '/form',
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'postpath': [],
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'prepath': ['form'],
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'queued': 0,
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'received_cookies': {},
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'received_headers': {'host': 'localhost:8880', 'content-type': 'application/json', 'accept-encoding': 'identity', 'content-length': '70', 'user-agent': 'Python-httplib2/$Rev$'},
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'requestHeaders': Headers({'host': ['localhost:8880'], 'content-type': ['application/json'], 'accept-encoding': ['identity'], 'content-length': ['70'], 'user-agent': ['Python-httplib2/$Rev$']}),
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'responseHeaders': Headers({'date': ['Thu, 26 Aug 2010 03:02:37 GMT'], 'content-type': ['text/html'], 'server': ['TwistedWeb/10.0.0']}),
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'site': <twisted.web.server.Site instance at 0x1b419e0>,
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'sitepath': [],
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'stack': [],
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'transport': <HTTPChannel #1 on 8880>,
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1]  'uri': '/form'}
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1] {"woggle": {"updated": "2010-08-25 20:02:37.449333", "version": 1234}}
2010-08-25 20:02:37-0700 [HTTPChannel,1,127.0.0.1] 127.0.0.1 - - [26/Aug/2010:03:02:36 +0000] "POST /form HTTP/1.1" 200 - "-" "Python-httplib2/$Rev$"

Quick notes on trying the Twisted websocket branch example

2010-05-24T14:43:32-07:00

Here are my quick notes on trying out the websocket example in Twisted's websocket branch. The documentation is here. The Twisted ticket is here. This came about after some conversation with @clemesha on Twitter.

(A Web Socket is a new, still-in-development technology, introduced in HTML5, and may be used for real-time web applications. It provides a simple (maybe better) alternative to existing Comet technology.)

(Note: The WebSocket API is still changing. Google Chrome supports (a version of) it. Firefox as of version 3.6, does not support it yet.)

(I am no expert on Web Sockets. I just think they are cool and want to start using them.)

Install Twisted websocket branch

Install pip and virtualenv

Install the Twisted websocket branch in a virtualenv

$ cd ~/lib/python-environments
$ virtualenv --no-site-packages --distribute twisted-websocket-branch
$ pip install -E twisted-websocket-branch/ -e svn+svn://svn.twistedmatrix.com/svn/Twisted/branches/websocket-4173-2

`~/wsdemo/index.html`

<html>
  <head>
    <title>WebSocket example: echo service</title>
  </head>
  <body>
    <h1>WebSocket example: echo service</h1>
    <script type="text/javascript">
        var ws = new WebSocket("ws://127.0.0.1:8080/ws/echo");
        ws.onmessage = function(evt) {
            var data = evt.data;
            var target = document.getElementById("received");
            target.value = target.value + data;
        };
        window.send_data = function() {
            ws.send(document.getElementById("send_input").value);
        };
    </script>
    <form>
      <label for="send_input">Text to send</label>
      <input type="text" name="send_input" id="send_input"/>
      <input type="submit" name="send_submit" id="send_submit" value="Send"
             onclick="send_data(); return false"/>
      <br>
      <label for="received">Received text</label>
      <textarea name="received" id="received"></textarea>
    </form>
  </body>
</html>

`~/wsdemo/demo.py`

import sys
from twisted.python import log
from twisted.internet import reactor
from twisted.web.static import File
from twisted.web.websocket import WebSocketHandler, WebSocketSite


class Echohandler(WebSocketHandler):
    def frameReceived(self, frame):
        log.msg("Received frame '%s'" % frame)
        self.transport.write(frame + "\n")


def main():
    log.startLogging(sys.stdout)
    root = File(".")
    site = WebSocketSite(root)
    site.addHandler("/ws/echo", Echohandler)
    reactor.listenTCP(8080, site)
    reactor.run()


if __name__ == "__main__":
    main()

Try it

Activate virtualenv

$ source ~/lib/python-environments/twisted-websocket-branch/bin/activate

Run server
```
$ cd ~/wsdemo
$ python demo.py
```
Visit http://localhost:8080/ in your WebSocket-enabled browser (e.g. Google Chrome)

Here's the console output:

2010-05-25 21:47:46-0700 [-] Log opened.
2010-05-25 21:47:46-0700 [-] twisted.web.websocket.WebSocketSite starting on 8080
2010-05-25 21:47:46-0700 [-] Starting factory <twisted.web.websocket.WebSocketSite instance at 0x94243ac>
2010-05-25 21:47:56-0700 [HTTPChannel,0,127.0.0.1] 127.0.0.1 - - [26/May/2010:04:47:56 +0000] "GET / HTTP/1.1" 304 - "-" "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.9 Safari/533.2"
2010-05-25 21:47:56-0700 [HTTPChannel,1,127.0.0.1] 127.0.0.1 - - [26/May/2010:04:47:56 +0000] "GET /favicon.ico HTTP/1.1" 404 145 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.9 Safari/533.2"
2010-05-25 21:48:16-0700 [HTTPChannel,2,127.0.0.1] Received frame 'hello'
2010-05-25 21:48:25-0700 [HTTPChannel,2,127.0.0.1] Received frame 'Twisted+Websocket!'

My Emacs Python environment

2010-05-10T10:29:32-07:00

Python mode

There are two Python modes: python-mode.el created by the Python community and python.el created by the Emacs community (David Love). The first version of python.el was included in Emacs 22 and an improved version was included in Emacs 23. I am currently using the default python.el included with Emacs 23. Below are some useful commands. For a complete list, open a Python file and run M-x apropos RET python RET

python-shift-left, C-c C-<: Decrease indentation of the region
python-shift-right, C-c C-<: Increase indentation of the region
python-switch-to-python, C-c C-z: Start (or switch) to a Python shell
python-send-buffer, C-c C-c: Run the current buffer in the Python interpreter
python-send-region, C-c C-r: Run the selected code in the Python interpreter
python-describe-symbol, C-c C-f: Get help on a Python symbol (Better than visiting the slow Python website, right?)

Configuration (`~/.emacs` changes)

(setq-default indent-tabs-mode nil)    ; use only spaces and no tabs
(setq default-tab-width 4)

Ropemacs

Ropemacs is an Emacs plugin to use Rope, a powerful Python refactoring library. It supports code completion, renaming, and other refactoring. Pymacs is a prerequisite for Ropemacs. Below are some useful commands. For more information, look in the Rope menu.

rope-code-assist, M-/: Code completion
rope-rename, C-c r r: Rename a variable, function, etc.

Installation

Note: pip install does not work with v0.25. Need to use `make install`

Install Pymacs (Emacs part)

$ curl -L https://github.com/pinard/Pymacs/tarball/v0.24-beta2 | tar zx
$ cd pinard-Pymacs-016b0bc
$ make 
$ mkdir -p ~/.emacs.d/vendor/pymacs-0.24-beta2
$ cp pymacs.el ~/.emacs.d/vendor/pymacs-0.24-beta2/pymacs.el 
$ emacs -batch -eval '(byte-compile-file "~/.emacs.d/vendor/pymacs-0.24-beta2/pymacs.el")'

Install Pymacs (Python part)

$ sudo pip install https://github.com/pinard/Pymacs/tarball/v0.24-beta2

Install Ropemacs and Rope

$ sudo pip install http://bitbucket.org/agr/ropemacs/get/tip.tar.gz

Edit ~/.emacs to use Ropemacs

(add-to-list 'load-path "~/.emacs.d/vendor/pymacs-0.24-beta2")
(require 'pymacs)
(pymacs-load "ropemacs" "rope-")
(setq ropemacs-enable-autoimport t)

Auto Complete Mode

Auto Complete Mode is an extension to make Emacs auto-completion nicer. It creates a drowpdown in the middle of your text. It can use Ropemacs as a backend.

Download, unpack and byte-compile

$ cd ~/.emacs.d/vendor 
$ curl http://cx4a.org/pub/auto-complete/auto-complete-1.2.tar.bz2 | tar jx 
$ cd auto-complete-1.2 
$ make byte-compile

Edit ~/.emacs

(add-to-list 'load-path "~/.emacs.d/vendor/auto-complete-1.2")
(require 'auto-complete-config)
(add-to-list 'ac-dictionary-directories "~/.emacs.d/vendor/auto-complete-1.2/dict")
(ac-config-default)

Yasnippet

YASnippet is a template system inspired by Textmate. Demo on YouTube

Code checking option 1: On-the-fly w/ Flymake

Flymake is part of Emacs 23. PyFlakes. pep8. See also pylint. Reference. Note: flymake-cursor.el is not required, but it makes flymake less obtrusive.

Install PyFlakes and pep8
```
$ sudo pip install pyflakes pep8
```
Get flymake-cursor.el and save it as ~/.emacs.d/vendor/flymake-cursor.el.

Edit ~/.emacs:

(add-to-list 'load-path "~/.emacs.d/vendor")

(add-hook 'find-file-hook 'flymake-find-file-hook)
(when (load "flymake" t)
  (defun flymake-pyflakes-init ()
    (let* ((temp-file (flymake-init-create-temp-buffer-copy
               'flymake-create-temp-inplace))
       (local-file (file-relative-name
            temp-file
            (file-name-directory buffer-file-name))))
      (list "pycheckers"  (list local-file))))
   (add-to-list 'flymake-allowed-file-name-masks
             '("\\.py\\'" flymake-pyflakes-init)))
(load-library "flymake-cursor")
(global-set-key [f10] 'flymake-goto-prev-error)
(global-set-key [f11] 'flymake-goto-next-error)

Create pycheckers, make it executable, and put it on your PATH

#!/bin/bash

pyflakes "$1"
pep8 --ignore=E221,E701,E202 --repeat "$1"
true

Code checking option 2: Manual checking w/ python-check

PyFlakes is a Lint-like tool for Python. It is like PyChecker, but it is fast. PyFlakes can be run from within Emacs using the default python mode in Emacs23.

Install PyFlakes
```
$ sudo pip install pyflakes
```
Edit ~/.emacs
```
(setq python-check-command "pyflakes")
```
From Emacs, while visiting a Python file, run M-x python-check or C-c C-v or C-c C-w.

Django

Reference

$ wget http://ourcomments.org/Emacs/DL/elisp/nxhtml/zip/nxhtml-2.08-100425.zip 
$ unzip nxhtml-2.08-100425.zip

(load "~/.emacs.d/nxhtml/autostart.el")
(setq mumamo-background-colors nil) 
(add-to-list 'auto-mode-alist '("\\.html$" . django-html-mumamo-mode))

References

EnigmaCurry: Emacs as a powerful Python IDE (2008-03-09)
EnigmaCurry: AutoComplete.el : Python Code Completion in Emacs (2009-01-21)
Lei Chen: Setup Perfect Python Environment In Emacs (2009-01-30)
Stack Overflow: Switching from python-mode.el to python.el (2008-12-12)
Stack Overflow: python emacs completion(2009-04-15)
Stack Overflow: How can I use Emacs Flymake mode for python with pyflakes and pylint checking code?
Jesse D. Legg: Emacs for Python Programmers: Basics (2010-03-14)
Jesse D. Legg: Emacs for Python Programmers #2: Virtualenv, IPython & Daemon Mode (2010-03-14)
Pymacs installation documentation
Edward O'Connor: ropemacs (2008-02-27)
NIPY Documentation: Tricked out emacs for python coding
Richard Riley: Emacs programming customisations
Emacs Documentation: Flymake
amix.dk: Static checking Python code

Notes on Python and beanstalkd on Ubuntu Karmic

2010-04-28T17:21:38-07:00

Beanstalk

Install and run beanstalkd

Reference: beanstalkd README

Install dependencies (libevent 1.4.1+)

$ sudo apt-get update 
$ sudo apt-get install libevent-1.4-2 libevent-dev

Download and unpack beanstalkd

$ curl http://xph.us/dist/beanstalkd/beanstalkd-1.4.4.tar.gz | tar zx

Build

$ cd beanstalkd-1.4.4 
$ ./configure 
$ make

Run the beanstalkd server

$ ./beanstalkd -d -l 127.0.0.5 -p 11300

Install the Python client library

Install pip

Install PyYAML (beanstalkc dependency)

$ sudo apt-get build-dep python-yaml 
$ sudo pip install PyYAML

Install beanstalkc
```
$ sudo pip install beanstalkc 
```

Example

Reference: beanstalkc TUTORIAL

import beanstalkc

# connect to server
bean = beanstalkc.Connection(host='127.0.0.5', port=11300)

# put jobs in queue
for i in range(5):
    print 'Put data: %d' % i
    bean.put(str(i))

# get jobs from queue
while True:
    job = bean.reserve()
    print 'Got data: %s' % job.body
    job.delete()

Results:

Put data: 0
Put data: 1
Put data: 2
Put data: 3
Put data: 4
Got data: 0
Got data: 1
Got data: 2
Got data: 3
Got data: 4

Using a Python timeout decorator for uploading to S3

2010-04-27T15:55:58-07:00

At work we are uploading many images to S3 using Python's boto library. However we are experiencing a RequestTimeTooSkewed error once every 100 uploads on average. We googled, but did not find a solution. Our system time was in sync and our file sizes were small (~50KB).

Since we couldn't find the root cause, we added a watchdog timer as a bandaid solution. We already use a retry decorator to retry uploads to S3 when we get a 500 Internal Server Error response. To this we added a timeout decorator which cancels the S3 upload if it takes more than a couple minutes. With this decorator, we don't have to wait the full 15 minutes before S3 returns the 403 Forbidden (RequestTimeTooSkewed error) response.

I found the timeout decorator at Activestate's Python recipes. It makes use of Python's signal library. Below is an example of how it's used.

import signal

class TimeoutError(Exception):
    def __init__(self, value = "Timed Out"):
        self.value = value
    def __str__(self):
        return repr(self.value)

def timeout(seconds_before_timeout):
    def decorate(f):
        def handler(signum, frame):
            raise TimeoutError()
        def new_f(*args, **kwargs):
            old = signal.signal(signal.SIGALRM, handler)
            signal.alarm(seconds_before_timeout)
            try:
                result = f(*args, **kwargs)
            finally:
                # reinstall the old signal handler
                signal.signal(signal.SIGALRM, old)
                # cancel the alarm
                # this line should be inside the "finally" block (per Sam Kortchmar)
                signal.alarm(0)
            return result
        new_f.func_name = f.func_name
        return new_f
    return decorate

Try it out:

import time

@timeout(5)
def mytest():
    print "Start"
    for i in range(1,10):
        time.sleep(1)
        print "%d seconds have passed" % i

if __name__ == '__main__':
    mytest()

Results:

Start
1 seconds have passed
2 seconds have passed
3 seconds have passed
4 seconds have passed
Traceback (most recent call last):
  File "timeout_ex.py", line 47, in <module>
    function_times_out()
  File "timeout_ex.py", line 17, in new_f
    result = f(*args, **kwargs)
  File "timeout_ex.py", line 42, in function_times_out
    time.sleep(1)
  File "timeout_ex.py", line 12, in handler
    raise TimeoutError()
__main__.TimeoutError: 'Timed Out'

Bug found by Sam Kortchmar (added 2018-08-18)

The code on the Activestate recipe has signal.alarm(0) outside of the finally block, but Sam Kortchmar reported to me that it needs to be inside the finally block so that the alarm will be cancelled even if there is an exception in the user's function that is handled by the user. With signal.alarm(0) outside of the finally block, the alarm still fires in that case.

Here is the test case sent by Sam:

import unittest2
import time

class TestTimeout(unittest2.TestCase):
    def test_watchdog_doesnt_kill_interpreter(self):
        """If this test executes at all, it's working!
        otherwise, the whole testing section will be killed
        and print out "Alarm clock"
        """
        @timeout(1)
        def my_func():
            raise Exception

        try:
            my_func()
        except Exception:
            pass
        time.sleep(1.2)
        assert True

The RequestTimeTooSkewed error

S3ResponseError: 403 Forbidden
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>RequestTimeTooSkewed</Code><Message>The difference between the request time and the current time is too large.</Message><MaxAllowedSkewMilliseconds>900000</MaxAllowedSkewMilliseconds><RequestId>7DDDS67HF8E37</RequestId><HostId>LKE893JFGDLASKJR9BJ-A9NASFPNAOPWEJORG-98DFGJA498JVJ-A04320JF0293JLKE</HostId><RequestTime>Tue, 27 Apr 2010 22:20:58 GMT</RequestTime><ServerTime>2010-04-27T22:55:24Z</ServerTime></Error>

Options for listing the files in a directory with Python

2010-04-19T16:57:04-07:00

I do a lot of sysadmin-type work with Python so I often need to list the contents of directory on a filesystem. Here are 4 methods I've used so far to do that. Let me know if you have any good alternatives. The examples were run on my Ubuntu Karmic machine.

OPTION 1 - `os.listdir()`

This is probably the simplest way to list the contents of a directory in Python.

import os
dirlist = os.listdir("/usr")

from pprint import pprint
pprint(dirlist)

Results:

['lib',
 'shareFeisty',
 'src',
 'bin',
 'local',
 'X11R6',
 'lib64',
 'sbin',
 'share',
 'include',
 'lib32',
 'man',
 'games']

OPTION 2 - `glob.glob()`

This method allows you to use shell-style wildcards.

import glob
dirlist = glob.glob('/usr/*')

from pprint import pprint
pprint(dirlist)

Results:

['/usr/lib',
 '/usr/shareFeisty',
 '/usr/src',
 '/usr/bin',
 '/usr/local',
 '/usr/X11R6',
 '/usr/lib64',
 '/usr/sbin',
 '/usr/share',
 '/usr/include',
 '/usr/lib32',
 '/usr/man',
 '/usr/games']

OPTION 3 - Unix "ls" command using `subprocess`

This method uses your operating system's "ls" command. It allows you to sort the output based on modification time, file size, etc. by passing these command-line options to the "ls" command. The following example lists the 10 most recently modified files in /var/log:

from subprocess import Popen, PIPE

def listdir_shell(path, *lsargs):
    p = Popen(('ls', path) + lsargs, shell=False, stdout=PIPE, close_fds=True)
    return [path.rstrip('\n') for path in p.stdout.readlines()]

dirlist = listdir_shell('/var/log', '-t')[:10]

from pprint import pprint
pprint(dirlist)

Results:

['auth.log',
 'syslog',
 'dpkg.log',
 'messages',
 'user.log',
 'daemon.log',
 'debug',
 'kern.log',
 'munin',
 'mysql.log']

OPTION 4 - Unix "find" style using `os.walk`

This method allows you to list directory contents recursively in a manner similar to the Unix "find" command. It uses Python's os.walk.

import os

def unix_find(pathin):
    """Return results similar to the Unix find command run without options
    i.e. traverse a directory tree and return all the file paths
    """
    return [os.path.join(path, file)
            for (path, dirs, files) in os.walk(pathin)
            for file in files]

pathlist = unix_find('/etc')[-10:]

from pprint import pprint
pprint(pathlist)

Results:

['/etc/fonts/conf.avail/20-lohit-gujarati.conf',
 '/etc/fonts/conf.avail/69-language-selector-zh-mo.conf',
 '/etc/fonts/conf.avail/11-lcd-filter-lcddefault.conf',
 '/etc/cron.weekly/0anacron',
 '/etc/cron.weekly/cvs',
 '/etc/cron.weekly/popularity-contest',
 '/etc/cron.weekly/man-db',
 '/etc/cron.weekly/apt-xapian-index',
 '/etc/cron.weekly/sysklogd',
 '/etc/cron.weekly/.placeholder']

Notes on using Gearman with Python

2010-04-17T15:53:57-07:00

We recently looked at some lightweight message queue options for a S3 uploader tool. One of the options we tried was Gearman. Gearman was originally developed by Brad Fitzpatrick (author of memcached). Gearman seems to be mature and actively developed. The original Perl version has been rewritten in C for improved performance and I found it easy to use. Here are my notes for getting started on Ubuntu Karmic.

Install and run Gearman server (C version)

There are 2 versions of the Gearman server: the new C version and the original Perl version. I chose the C version.

sudo apt-get install gearman-job-server

During the installation process, Ubuntu/Apt starts the Gearman server process. Running ps -ef | grep gearmand shows me:

gearman    497     1  0 15:41 ?        00:00:00 /usr/sbin/gearmand --pid-file=/var/run/gearman gearmand.pid --user=gearman --daemon --log-file=/var/log/gearman-job-server/gearman.log

This shows the log file is at /var/log/gearman-job-server/gearman.log. Also, it listens at address 127.0.0.1 and port 4730 by default. You can change change the address, port, etc. via the command-line options. To see all the options, type gearmand --help.

Install Python Gearman client library

Install pip
Install gearman library
```
sudo pip install gearman
```

Example

This example is taken from Graham's article.

producer.py:

import time
from gearman import GearmanClient, Task

client = GearmanClient(["127.0.0.1"])

for i in range(5):
    client.dispatch_background_task('speak', i)
    print 'Dispatched %d' % i
    time.sleep(1)

consumer.py:

from gearman import GearmanWorker

def speak(job):
    r = 'Hello %s' % job.arg
    print r
    return r

worker = GearmanWorker(["127.0.0.1"])
worker.register_function('speak', speak, timeout=3)
worker.work()

First running python producer.py gives me the following terminal output:

Dispatched 0
Dispatched 1
Dispatched 2
Dispatched 3
Dispatched 4

Then running python consumer.py gives me the following terminal output:

Hello 0
Hello 1
Hello 2
Hello 3
Hello 4

Monitoring a filesystem with Python and Pyinotify

2010-04-09T10:02:39-07:00

Pyinotify is a Python library for monitoring filesystem events on Linux through the inotify Linux kernel subsystem. It can monitor when a file is created, accessed, deleted, modified, etc. For a full list of Pyinotify events see the documentation.

Install Pyinotify

Install pip
Install Pyinotify
```
$ sudo pip install pyinotify
```

Example

import pyinotify

class MyEventHandler(pyinotify.ProcessEvent):
    def process_IN_ACCESS(self, event):
        print "ACCESS event:", event.pathname

    def process_IN_ATTRIB(self, event):
        print "ATTRIB event:", event.pathname

    def process_IN_CLOSE_NOWRITE(self, event):
        print "CLOSE_NOWRITE event:", event.pathname

    def process_IN_CLOSE_WRITE(self, event):
        print "CLOSE_WRITE event:", event.pathname

    def process_IN_CREATE(self, event):
        print "CREATE event:", event.pathname

    def process_IN_DELETE(self, event):
        print "DELETE event:", event.pathname

    def process_IN_MODIFY(self, event):
        print "MODIFY event:", event.pathname

    def process_IN_OPEN(self, event):
        print "OPEN event:", event.pathname

def main():
    # watch manager
    wm = pyinotify.WatchManager()
    wm.add_watch('/var/log', pyinotify.ALL_EVENTS, rec=True)

    # event handler
    eh = MyEventHandler()

    # notifier
    notifier = pyinotify.Notifier(wm, eh)
    notifier.loop()

if __name__ == '__main__':
    main()

Results:

MODIFY event: /var/log/auth.log
MODIFY event: /var/log/syslog
MODIFY event: /var/log/syslog
MODIFY event: /var/log/auth.log
MODIFY event: /var/log/auth.log
MODIFY event: /var/log/syslog
MODIFY event: /var/log/auth.log
MODIFY event: /var/log/syslog
MODIFY event: /var/log/auth.log
OPEN event: /var/log/munin/munin-update.log
MODIFY event: /var/log/munin/munin-update.log
MODIFY event: /var/log/munin/munin-update.log
MODIFY event: /var/log/munin/munin-node.log
MODIFY event: /var/log/munin/munin-update.log
MODIFY event: /var/log/munin/munin-update.log
CLOSE_WRITE event: /var/log/munin/munin-update.log
OPEN event: /var/log/munin/munin-limits.log
MODIFY event: /var/log/munin/munin-limits.log
CLOSE_WRITE event: /var/log/munin/munin-limits.log
OPEN event: /var/log/munin/munin-graph.log
MODIFY event: /var/log/munin/munin-graph.log
CLOSE_WRITE event: /var/log/munin/munin-graph.log
OPEN event: /var/log/munin/munin-html.log
MODIFY event: /var/log/munin/munin-html.log
CLOSE_WRITE event: /var/log/munin/munin-html.log
MODIFY event: /var/log/auth.log
MODIFY event: /var/log/auth.log
MODIFY event: /var/log/syslog
MODIFY event: /var/log/auth.log
MODIFY event: /var/log/auth.log
MODIFY event: /var/log/syslog
MODIFY event: /var/log/auth.log
...

How to sort a list of dicts in Python

2010-04-02T10:46:58-07:00

I'm using the MongoDB group function (it's similar to SQL's GROUP BY) to aggregate some results for my live-log-analyzer project. This function is pretty cool, but it does not sort the grouped data. Here is how to sort the data. (It is only one line of Python, but I have a hard time remembering how to do this.)

DATA is the output of the mongoDB group function. I want to sort this list of dicts by 'ups_ad'.

from pprint import pprint

DATA = [
    {u'avg': 2.9165000000000001,
     u'count': 10.0,
     u'total': 29.165000000000003,
     u'ups_ad': u'10.194.154.49:80'},
    {u'avg': 2.6931000000000003,
     u'count': 10.0,
     u'total': 26.931000000000001,
     u'ups_ad': u'10.194.155.176:80'},
    {u'avg': 1.9860909090909091,
     u'count': 11.0,
     u'total': 21.847000000000001,
     u'ups_ad': u'10.195.71.146:80'},
    {u'avg': 1.742818181818182,
     u'count': 11.0,
     u'total': 19.171000000000003,
     u'ups_ad': u'10.194.155.48:80'}
    ]

data_sorted = sorted(DATA, key=lambda item: item['ups_ad'])
pprint(data_sorted)

Results:

[{u'avg': 2.9165000000000001,
  u'count': 10.0,
  u'total': 29.165000000000003,
  u'ups_ad': u'10.194.154.49:80'},
 {u'avg': 2.6931000000000003,
  u'count': 10.0,
  u'total': 26.931000000000001,
  u'ups_ad': u'10.194.155.176:80'},
 {u'avg': 1.742818181818182,
  u'count': 11.0,
  u'total': 19.171000000000003,
  u'ups_ad': u'10.194.155.48:80'},
 {u'avg': 1.9860909090909091,
  u'count': 11.0,
  u'total': 21.847000000000001,
  u'ups_ad': u'10.195.71.146:80'}]

References:

Update 2010-04-28: Apparently I didn't use Google properly when I first wrote this post. Searching today produced several sources for doing exactly this.

os.path.relpath() source code for Python 2.5

2010-03-31T10:30:01-07:00

Today I needed to use the os.path.relpath() function. However this function was introduced in Python 2.6 and I am using Python 2.5 for my project. Luckily, James Gardner has written a version that works with Python 2.5 (on Posix systems (which mine is (Linux))). His relpath function is part of his BareNecessities package. You can view the documentation here.

Here is James Gardner's relpath function:

from posixpath import curdir, sep, pardir, join

def relpath(path, start=curdir):
    """Return a relative version of a path"""
    if not path:
        raise ValueError("no path specified")
    start_list = posixpath.abspath(start).split(sep)
    path_list = posixpath.abspath(path).split(sep)
    # Work out how much of the filepath is shared by start and path.
    i = len(posixpath.commonprefix([start_list, path_list]))
    rel_list = [pardir] * (len(start_list)-i) + path_list[i:]
    if not rel_list:
        return curdir
    return join(*rel_list)

Two of the simplest Python decorator examples

2010-03-09T00:00:00-08:00

After trying for about the fifth time, I think I am starting to understand Python decorators due largely to Jack Diederich's PyCon 2009 talk, Class Decorators: Radically Simple.

Jack's practical definition of a decorator is:

A function that takes one argument
Returns something useful

In many cases, a function decorator can be described more specifically:

A function that takes one argument (the function being decorated)
Returns the same function or a function with a similar signature

As Jack states in his talk, a decorator is merely syntactic sugar. The same functionality can be achieved without using the decorator syntax. This code snippet:

@mydecorator
def myfunc():
    pass

is equivalent to:

def myfunc():
    pass
myfunc = mydecorator(myfunc)

Here are two of the simplest examples from Jack's talk:

Identity decorator

This is the simplest decorator. It does nothing. It takes the decorated function as an argument and returns the same function without doing anything.

def identity(ob):
    return ob

@identity
def myfunc():
    print "my function"

myfunc()
print myfunc

my function
<function myfunc at 0xb76db17c>

Hello world decorator

I am dumb. This one doesn't do what it's supposed to.

This decorator prints "Hello world" before returning the decorated function.

def helloworld(ob): print "Hello world" return ob @helloworld def myfunc(): print "my function" myfunc() print myfunc

Hello world my function <function myfunc at 0xb78360d4>

A simple decorator that actually does something (and is not broken like the Hello world decorator above)

This decorator is used to print some text before and after calling the decorated function. Most of the time the decorated function is wrapped by a function which calls the decorated function and returns what it returns. ?When is a wrapper not needed?

from functools import wraps

def mydecorator(f):
    @wraps(f)
    def wrapped(*args, **kwargs):
        print "Before decorated function"
        r = f(*args, **kwargs)
        print "After decorated function"
        return r
    return wrapped

@mydecorator
def myfunc(myarg):
    print "my function", myarg
    return "return value"

r = myfunc('asdf')
print r

Before decorated function
my function asdf
After decorated function
return value

What if I want to pass arguments to the decorator itself (not the decorated function)?

A decorator takes exactly one argument so you will need a factory to create the decorator. Unlike the previous example, notice how the factory function is called with parentheses, @mydecorator_not_actually(count=5), to produce the real decorator.

from functools import wraps

def mydecorator_not_actually(count):
    def true_decorator(f):
        @wraps(f)
        def wrapped(*args, **kwargs):
            for i in range(count):
                print "Before decorated function"
            r = f(*args, **kwargs)
            for i in range(count):
                print "After decorated function"
            return r
        return wrapped
    return true_decorator

@mydecorator_not_actually(count=5)
def myfunc(myarg):
    print "my function", myarg
    return "return value"

r = myfunc('asdf')
print r

Before decorated function
Before decorated function
Before decorated function
Before decorated function
Before decorated function
my function asdf
After decorated function
After decorated function
After decorated function
After decorated function
After decorated function
return value

References / See also

The Decorator Library on the Python wiki
Understanding Python decorators - Stack Overflow

How to list attributes of an EC2 instance with Python and boto

2010-03-08T12:00:23-08:00

Here's how to find out information about your Amazon EC2 instances using the Python boto library.

Install boto

Install pip
Install boto
```
sudo pip install boto
```

Example

from pprint import pprint
from boto import ec2

AWS_ACCESS_KEY_ID = 'XXXXXXXXXXXXXXXXXX'
AWS_SECRET_ACCESS_KEY = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'

ec2conn = ec2.connection.EC2Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
reservations = ec2conn.get_all_instances()
instances = [i for r in reservations for i in r.instances]
for i in instances:
    pprint(i.__dict__)
    break # remove this to list all instances

Results:

{'_in_monitoring_element': False,
 'ami_launch_index': u'0',
 'architecture': u'x86_64',
 'block_device_mapping': {},
 'connection': EC2Connection:ec2.amazonaws.com,
 'dns_name': u'ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com',
 'id': u'i-xxxxxxxx',
 'image_id': u'ami-xxxxxxxx',
 'instanceState': u'\n                    ',
 'instance_class': None,
 'instance_type': u'm1.large',
 'ip_address': u'xxx.xxx.xxx.xxx',
 'item': u'\n                ',
 'kernel': None,
 'key_name': u'FARM-xxxx',
 'launch_time': u'2009-10-27T17:10:22.000Z',
 'monitored': False,
 'monitoring': u'\n                    ',
 'persistent': False,
 'placement': u'us-east-1d',
 'previous_state': None,
 'private_dns_name': u'ip-10-xxx-xxx-xxx.ec2.internal',
 'private_ip_address': u'10.xxx.xxx.xxx',
 'product_codes': [],
 'public_dns_name': u'ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com',
 'ramdisk': None,
 'reason': '',
 'region': RegionInfo:us-east-1,
 'requester_id': None,
 'rootDeviceType': u'instance-store',
 'root_device_name': None,
 'shutdown_state': None,
 'spot_instance_request_id': None,
 'state': u'running',
 'state_code': 16,
 'subnet_id': None,
 'vpc_id': None}

For more information

<del>Declarative</del>Functional vs. imperative style in Python

2010-02-26T10:11:26-08:00

Which is better for Python? An imperative / procedural style or a ~~declarative /~~ functional style? Why? I put some examples I've encountered below.

Update 2010-04-10: As dan and pete pointed out, the examples below are not declarative, so I replaced declarative with functional. I've also added a better method for example 1 suggested by deno.

Imperative	~~Declarative~~Functional
Example 1: For each key in a list of S3 keys, download the associated file and return a list of the filenames.
def download_files(keylist): filelist = [] for key in keylist: filename = '%s/%s' % (TEMP_DIR, key.name) key.get_contents_to_filename(filename) filelist.append(filename) return filelist	def download_files(keylist): def get_file(key): filename = '%s/%s' % (TEMP_DIR, key.name) key.get_contents_to_filename(filename) return filename return [get_file(key) for key in keylist]
Here is a better method suggested by deno: def download_files(keylist): for key in keylist: filename = '%s/%s' % (TEMP_DIR, key.name) key.get_contents_to_filename(filename) yield filename
Example 2: Given a dict, strip the percent character from the value whose key is 'df_use_percent' and return the dict.
def post_process(data): newdict = {} for k, v in data.iteritems(): if k == 'df_use_percent': v = v.rstrip('%') newdict[k] = v return newdict	def post_process(data): def remove_percent(k, v): if k == 'df_use_percent': v = v.rstrip('%') return (k, v) return dict([remove_percent(k, v) for k, v in data.iteritems()])
Example 3: Parse a ~/.ssh/config file and return a dict of ssh options for a given host
def get_ssh_options(host): def get_value(line, key_arg): m = re.search(r'^\s%s\s+(.+)\s$' % key_arg, line, re.I) if m: return m.group(1) else: return '' mydict = {} for line in file(SSH_CONFIG_FILE): line = line.strip() line = re.sub(r'#.*$', '', line) if not line: continue if get_value(line, 'Host') != host: continue if get_value(line, 'Host') == '': k, v = line.lower().split(None, 1) mydict[k] = v return mydict	def get_ssh_options(host): def remove_comment(line): return re.sub(r'#.$', '', line) def get_value(line, key_arg): m = re.search(r'^\s%s\s+(.+)\s*$' % key_arg, line, re.I) if m: return m.group(1) else: return '' def not_the_host(line): return get_value(line, 'Host') != host def not_a_host(line): return get_value(line, 'Host') == '' lines = [line.strip() for line in file(SSH_CONFIG_FILE)] comments_removed = [remove_comment(line) for line in lines] blanks_removed = [line for line in comments_removed if line] top_removed = list(itertools.dropwhile(not_the_host, blanks_removed))[1:] goodpart = itertools.takewhile(not_a_host, top_removed) return dict([line.lower().split(None, 1) for line in goodpart])
Example 4: Summation
total = 0 for item in item_list: total += item.value	total = sum(item.value for item in item_list)

Python paramiko notes

2010-02-24T12:05:01-08:00

Paramiko is a Python ssh package. The following is an example that makes use of my ssh config file, creates a ssh client, runs a command on a remote server, and reads a remote file using sftp. Paramiko is released under the GNU LGPL

Install paramiko

Install pip
Install paramiko
```
sudo pip install paramiko
```

Example

from paramiko import SSHClient, SSHConfig

# ssh config file
config = SSHConfig()
config.parse(open('/home/eliot/.ssh/config'))
o = config.lookup('testapa')

# ssh client
ssh_client = SSHClient()
ssh_client.load_system_host_keys()
ssh_client.connect(o['hostname'], username=o['user'], key_filename=o['identityfile'])

# run a command
print "\nRun a command"
cmd = 'ps aux'
stdin, stdout, stderr = ssh_client.exec_command(cmd)
for i, line in enumerate(stdout):
    line = line.rstrip()
    print "%d: %s" % (i, line)
    if i >= 9:
        break

# open a remote file
print "\nOpen a remote file"
sftp_client = ssh_client.open_sftp()
sftp_file = sftp_client.open('/var/log/messages')
for i, line in enumerate(sftp_file):
    print "%d: %s" % (i, line[:15])
    if i >= 9:
        break
sftp_file.close()
sftp_client.close()

# close ssh client
ssh_client.close()

Results:

Run a command
0: USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1: root         1  0.0  0.0   1920   536 ?        S     2009   0:00 /sbin/init
2: root         2  0.0  0.0      0     0 ?        S     2009   0:00 [migration/0]
3: root         3  0.0  0.0      0     0 ?        SN    2009   0:00 [ksoftirqd/0]
4: root         4  0.0  0.0      0     0 ?        S     2009   0:00 [watchdog/0]
5: root         5  0.0  0.0      0     0 ?        S<    2009   0:00 [events/0]
6: root         6  0.0  0.0      0     0 ?        S<    2009   0:00 [khelper]
7: root         7  0.0  0.0      0     0 ?        S<    2009   0:00 [kthread]
8: root         8  0.0  0.0      0     0 ?        S<    2009   0:00 [xenwatch]
9: root         9  0.0  0.0      0     0 ?        S<    2009   0:00 [xenbus]

Open a remote file
0: Feb 21 06:47:03
1: Feb 21 07:14:03
2: Feb 21 07:34:03
3: Feb 21 07:54:04
4: Feb 21 08:14:04
5: Feb 21 08:34:05
6: Feb 21 08:54:05
7: Feb 21 09:14:05
8: Feb 21 09:34:06
9: Feb 21 09:54:06

Some SFTP helper code

Added 2011-09-15

import errno
import os.path

import paramiko


class SFTPHelper(object):

    def connect(self, hostname, **ssh_kwargs):
        """Create a ssh client and a sftp client

        **ssh_kwargs are passed directly to paramiko.SSHClient.connect()
        """
        self.sshclient = paramiko.SSHClient()
        self.sshclient.load_system_host_keys()
        self.sshclient.set_missing_host_key_policy(paramiko.AutoAddPolicy())
        self.sshclient.connect(hostname, **ssh_kwargs)
        self.sftpclient = self.sshclient.open_sftp()

    def remove_directory(self, path):
        """Remove remote directory that may contain files.
        It does not support directories that contain subdirectories
        """
        if self.exists(path):
            for filename in self.sftpclient.listdir(path):
                filepath = os.path.join(path, filename)
                self.sftpclient.remove(filepath)
            self.sftpclient.rmdir(path)

    def put_directory(self, localdir, remotedir):
        """Put a directory of files on the remote server
        Create the remote directory if it does not exist
        Does not support directories that contain subdirectories
        Return the number of files transferred
        """
        if not self.exists(remotedir):
            self.sftpclient.mkdir(remotedir)
        count = 0
        for filename in os.listdir(localdir):
            self.sftpclient.put(
                os.path.join(localdir, filename),
                os.path.join(remotedir, filename))
            count += 1
        return count

    def exists(self, path):
        """Return True if the remote path exists
        """
        try:
            self.sftpclient.stat(path)
        except IOError, e:
            if e.errno == errno.ENOENT:
                return False
            raise
        else:
            return True

Python MongoDB notes

2010-02-22T15:01:23-08:00

MongoDB is a popular new schemaless, document-oriented, NoSQL database. It is useful for logging and real-time analytics. I'm working on a tool to store log files from multiple remote hosts to MongoDB, then analyze it in real-time and print pretty plots. My work in progress is located on github.

Here are my first steps using PyMongo. I store an Apache access log to MongoDB and then query it for the number of requests in the last minute. I am running on Ubuntu Karmic 32-bit (though I think MongoDB really wants to run on 64-bit).

Install and run MongoDB

Download and install MongoDB (Reference)

cd ~/lib
curl http://downloads.mongodb.org/linux/mongodb-linux-i686-latest.tgz | tar zx
ln -s mongodb-linux-i686-2010-02-22 mongodb

Create data directory
```
mkdir -p ~/var/mongodb/db
```

Run MongoDB (Reference)

~/lib/mongodb/bin/mongod --dbpath ~/var/mongodb/db

Install PyMongo

Install pip
Install PyMongo (Reference)
```
sudo pip install pymongo
```

Simple Example

writer.py:

import re
from datetime import datetime
from subprocess import Popen, PIPE, STDOUT
from pymongo import Connection
from pymongo.errors import CollectionInvalid

HOST = 'us-apa1'
LOG_PATH = '/var/log/apache2/http-mydomain.com-access.log'
DB_NAME = 'mydb'
COLLECTION_NAME = 'apache_access'
MAX_COLLECTION_SIZE = 5 # in megabytes

def main():
    # connect to mongodb
    mongo_conn = Connection()
    mongo_db = mongo_conn[DB_NAME]
    try:
        mongo_coll = mongo_db.create_collection(COLLECTION_NAME,
                                                capped=True,
                                                size=MAX_COLLECTION_SIZE*1048576)
    except CollectionInvalid:
        mongo_coll = mongo_db[COLLECTION_NAME]

    # open remote log file
    cmd = 'ssh -f %s tail -f %s' % (HOST, LOG_PATH)
    p = Popen(cmd, shell=True, stdout=PIPE, stderr=STDOUT)

    # parse and store data
    while True:
        line = p.stdout.readline()
        data = parse_line(line)
        data['time'] = convert_time(data['time'])
        mongo_coll.insert(data)

def parse_line(line):
    """Apache combined log format
    %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"
    """
    m = re.search(' '.join([
                r'(?P<host>(\d+\.){3}\d+)',
                r'.*',
                r'\[(?P<time>[^\]]+)\]',
                r'"\S+ (?P<url>\S+)',
                ]), line)
    if m:
        return m.groupdict()
    else:
        return {}

def convert_time(time_str):
    time_str = re.sub(r' -\d{4}', '', time_str)
    return datetime.strptime(time_str, "%d/%b/%Y:%H:%M:%S")

if __name__ == '__main__':
    main()

reader.py:

import time
from datetime import datetime, timedelta
from pymongo import Connection

DB_NAME = 'mydb'
COLLECTION_NAME = 'apache_access'

def main():
    # connect to mongodb
    mongo_conn = Connection()
    mongo_db = mongo_conn[DB_NAME]
    mongo_coll = mongo_db[COLLECTION_NAME]

    # find the number of requests in the last minute
    while True:
        d = datetime.now() - timedelta(seconds=60)
        N_requests = mongo_coll.find({'time': {'$gt': d}}).count()
        print 'Requests in the last minute:',  N_requests
        time.sleep(2)

if __name__ == '__main__':
    main()

Running python writer.py in one terminal and python reader.py in another terminal, I get the following results:

Requests in the last minute: 13
Requests in the last minute: 14
Requests in the last minute: 14
Requests in the last minute: 14
Requests in the last minute: 13
Requests in the last minute: 14
Requests in the last minute: 15
...

Using Python to write to an Excel / OpenOffice Calc spreadsheet on Ubuntu Linux

2010-02-15T17:41:25-08:00

Via Matt Harrison's blog post, here is how to write Excel or OpenOffice.org Calc spreadsheet files using Python and the xlwt library. Xlwt is a fork of pyExcelerator which handles only writing spreadsheet files. For reading spreadsheets, see xlrd. Note, these libraries don't use COM, so they will work on non-Windows OSes, such as Linux. For more information, see Matt's blog post. He even has a PDF cheat sheet.

Install pip
Install xlwt
```
sudo pip install xlwt
```

Create an example script:

import xlwt

DATA = (("The Essential Calvin and Hobbes", 1988,),
        ("The Authoritative Calvin and Hobbes", 1990,),
        ("The Indispensable Calvin and Hobbes", 1992,),
        ("Attack of the Deranged Mutant Killer Monster Snow Goons", 1992,),
        ("The Days Are Just Packed", 1993,),
        ("Homicidal Psycho Jungle Cat", 1994,),
        ("There's Treasure Everywhere", 1996,),
        ("It's a Magical World", 1996,),)

wb = xlwt.Workbook()
ws = wb.add_sheet("My Sheet")
for i, row in enumerate(DATA):
    for j, col in enumerate(row):
        ws.write(i, j, col)
ws.col(0).width = 256 * max([len(row[0]) for row in DATA])
wb.save("myworkbook.xls")

Results:

How to install MySQLdb in a virtualenv on Ubuntu Karmic

2010-02-15T12:07:08-08:00

Install dependencies
```
sudo apt-get build-dep python-mysqldb
```

Install MySQLdb with pip/virtualenv

cd ~/lib/python-environments
virtualenv --distribute --no-site-packages default
pip install -E default/ MySQL-python

EnvironmentError: mysql_config not found

The first time, I did not install the dependencies and got the following error:

$ pip install -E default/ http://sourceforge.net/projects/mysql-python/files/mysql-python-test/1.2.3c1/MySQL-python-1.2.3c1.tar.gz/download
Downloading/unpacking http://sourceforge.net/projects/mysql-python/files/mysql-python-test/1.2.3c1/MySQL-python-1.2.3c1.tar.gz/download
  Downloading download (89Kb): 89Kb downloaded
  Running setup.py egg_info for package from http://sourceforge.net/projects/mysql-python/files/mysql-python-test/1.2.3c1/MySQL-python-1.2.3c1.tar.gz/download
    sh: mysql_config: not found
    Traceback (most recent call last):
      File "", line 14, in 
      File "/tmp/pip-ZR_ZwM-build/setup.py", line 15, in 
        metadata, options = get_config()
      File "setup_posix.py", line 43, in get_config
        libs = mysql_config("libs_r")
      File "setup_posix.py", line 24, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    EnvironmentError: mysql_config not found
    Complete output from command python setup.py egg_info:
    sh: mysql_config: not found

Traceback (most recent call last):
  
  File "", line 14, in 
  
  File "/tmp/pip-ZR_ZwM-build/setup.py", line 15, in 

    metadata, options = get_config()
  
  File "setup_posix.py", line 43, in get_config

    libs = mysql_config("libs_r")
  
  File "setup_posix.py", line 24, in mysql_config
    
    raise EnvironmentError("%s not found" % (mysql_config.path,))

EnvironmentError: mysql_config not found

----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in ./pip-log.txt

How to install pip on Ubuntu

2010-02-15T11:42:52-08:00

Pip is a better alternative to Easy Install for installing Python packages. It is most "nutritious" when used with its companion virtualenv. For more information on pip and virtualenv see my blog post: Notes on using pip and virtualenv with Django.

Install pip and virtualenv for Ubuntu 10.10 Maverick and newer

$ sudo apt-get install python-pip python-dev build-essential 
$ sudo pip install --upgrade pip 
$ sudo pip install --upgrade virtualenv

For older versions of Ubuntu

Install Easy Install

$ sudo apt-get install python-setuptools python-dev build-essential

Install pip
```
$ sudo easy_install pip 
```

Install virtualenv

$ sudo pip install --upgrade virtualenv

Python setdefault example

2010-02-09T17:10:22-08:00

I always forget how to use Python's setdefault dictionary operation so here is a quick example.

What I want:

DATA_SOURCE = (('key1', 'value1'),
               ('key1', 'value2'),
               ('key2', 'value3'),
               ('key2', 'value4'),
               ('key2', 'value5'),)

newdata = {}
for k, v in DATA_SOURCE:
    if newdata.has_key(k):
        newdata[k].append(v)
    else:
        newdata[k] = [v]
print newdata

Results:

{'key2': ['value3', 'value4', 'value5'], 'key1': ['value1', 'value2']}

Better way using setdefault:

newdata = {}
for k, v in DATA_SOURCE:
    newdata.setdefault(k, []).append(v)
print newdata

The results are the same.

A hack to copy files between two remote hosts using Python

2010-02-08T12:51:29-08:00

I sometimes need to copy a file (such as a database dump) between two remote hosts on EC2. Normally this involves a few steps: scp'ing the ssh keyfile to Host 1, ssh'ing to Host 1, looking up the address for Host 2, then scp'ing the desired file from Host 1 to Host 2.

I was excited to read in the man page that scp can copy files between two remote hosts directly. However, it didn't work for me. Apparently, running scp host1:myfile host2: is like running ssh host1 scp myfile host2: so I still need the address of host2 and my ssh keyfile on host1.

My inablility to let go of this small efficiency increaser, led me to (what else?) write a Python script. I know this is a hack so if you know of a better way of doing this, let me know.

The script parses my ~/.ssh/config file to find the ssh keyfile and address for host 2, uses scp to copy the ssh keyfile to host 1, then runs the ssh host1 scp ... command with the appropriate options filled in. The script captures all of the ssh options for host 2 and passes them on the command line to scp via the -o command-line option. Note, I only tested this to set the User option– I don't know if all ssh options will work.

Warning: the script disables the StrictHostKeyChecking SSH option, so you are more vunerable to a man-in-the-middle attack.

Update 2010-02-16: I've found there is already a SSH config file parser in the paramiko library. The source can be viewed on github.

Update 2010-05-04: I modified my code to use the paramiko library and also allow command line options to be passed directly to the scp command. The latest code is available in my github repository remote-tools.

import itertools
import os
import re
import sys

SSH_CONFIG_FILE = '/home/eliot/.ssh/config'

def main():
    host1, path1 = sys.argv[1].split(':', 1)
    host2, path2 = sys.argv[2].split(':', 1)

    o = get_ssh_options(host2)
    keyfile_remote = '/tmp/%s' % os.path.basename(o['identityfile'])
    ssh_options = ' -o'.join(['='.join([k, v]) for k, v in o.iteritems()
                              if k != 'hostname' and k != 'identityfile'])

    run('scp %s %s:%s' % (o['identityfile'], host1, keyfile_remote))
    run('ssh %s scp -p -i %s -oStrictHostKeyChecking=no -o%s %s %s:%s' % (
            host1, keyfile_remote, ssh_options, path1, o['hostname'], path2))

def get_ssh_options(host):
    """Parse ~/.ssh/config file and return a dict of ssh options for host
    Note: dict keys are all lowercase
    """
    def remove_comment(line):
        return re.sub(r'#.*$', '', line)
    def get_value(line, key_arg):
        m = re.search(r'^\s*%s\s+(.+)\s*$' % key_arg, line, re.I)
        if m:
            return m.group(1)
        else:
            return ''
    def not_the_host(line):
        return get_value(line, 'Host') != host
    def not_a_host(line):
        return get_value(line, 'Host') == ''

    lines = [line.strip() for line in file(SSH_CONFIG_FILE)]
    comments_removed = [remove_comment(line) for line in lines]
    blanks_removed = [line for line in comments_removed if line]
    top_removed = list(itertools.dropwhile(not_the_host, blanks_removed))[1:]
    goodpart = itertools.takewhile(not_a_host, top_removed)
    return dict([line.lower().split(None, 1) for line in goodpart])

def run(cmd):
    print cmd
    os.system(cmd)

if __name__ == '__main__':
    main()

Here is an example ~/.ssh/config file:

Host testhost1
  User root
  Hostname 48.879.24.567
  IdentityFile /home/eliot/.ssh/test_keyfile

Host testhost2
  User root
  Hostname 56.384.58.212
  IdentityFile /home/eliot/.ssh/test_keyfile

Here is an example run. It copies /tmp/testfile from testhost1 to the same path on testhost2.

python scp_r2r.py testhost1:/tmp/testfile testhost2:/tmp/testfile

Here is the console output:

scp /home/eliot/.ssh/test_keyfile testhost1:/tmp/test_keyfile
test_keyfile                                              100% 1674     1.6KB/s   00:00
ssh testhost1 scp -p -i /tmp/test_keyfile -oStrictHostKeyChecking=no -ouser=root /tmp/testfile 56.384.58.212:/tmp/testfile

One inconvenience is that it doesn't show the progress for the main transfer. If anyone knows how I can fix this, please let me know.

Iterating over lines in multiple Linux log files using Python

2010-01-23T00:07:51-08:00

I needed to parse through my Nginx log files to debug a problem. However, the logs are separated into many files, most of them are gzipped, and I wanted the ordering within the files reversed. So I abstracted the logic to handle this into a function. Now I can pass a glob pattern such as /var/log/nginx/cache.log* to my function, and iterate over each line in all the files as if they were one file. Here is my function. Let me know if there is a better way to do this.

Update 2010-02-24:To handle multiple log files on a remote host, see my script on github.

import glob
import gzip
import re
 
def get_lines(log_glob):
    """Return an iterator of each line in all files matching log_glob.
    Lines are sorted most recent first.
    Files are sorted by the integer in the suffix of the log filename.
    Suffix may be one of the following:
         .X (where X is an integer)
         .X.gz (where X is an integer)
    If the filename does not end in either suffix, it is treated as if X=0
    """
    def sort_by_suffix(a, b):
        def get_suffix(fname):
            m = re.search(r'.(?:\.(\d+))?(?:\.gz)?$', fname)
            if m.lastindex:
                suf = int(m.group(1))
            else:
                suf = 0
            return suf
        return get_suffix(a) - get_suffix(b)
 
    filelist = glob.glob(log_glob)
    for filename in sorted(filelist, sort_by_suffix):
        if filename.endswith('.gz'):
            fh = gzip.open(filename)
        else:
            fh = open(filename)
        for line in reversed(fh.readlines()):
            yield line
        fh.close()

Here is an example run on my machine. It prints the first 15 characters of every 1000th line of all my syslog files.

for i, line in enumerate(get_lines('/var/log/syslog*')):
    if not i % 1000:
        print line[:15]

File listing:

$ ls -l /var/log/syslog*
-rw-r----- 1 syslog adm 169965 2010 01/23 00:18 /var/log/syslog
-rw-r----- 1 syslog adm 350334 2010 01/22 08:03 /var/log/syslog.1
-rw-r----- 1 syslog adm  18078 2010 01/21 07:49 /var/log/syslog.2.gz
-rw-r----- 1 syslog adm  16700 2010 01/20 07:43 /var/log/syslog.3.gz
-rw-r----- 1 syslog adm  18197 2010 01/19 07:52 /var/log/syslog.4.gz
-rw-r----- 1 syslog adm  15737 2010 01/18 07:45 /var/log/syslog.5.gz
-rw-r----- 1 syslog adm  16157 2010 01/17 07:54 /var/log/syslog.6.gz
-rw-r----- 1 syslog adm  20285 2010 01/16 07:48 /var/log/syslog.7.gz

Results:

Jan 22 23:57:01
Jan 22 14:09:01
Jan 22 03:51:01
Jan 21 17:35:01
Jan 21 14:37:33
Jan 21 08:35:01
Jan 20 22:12:01
Jan 20 11:56:01
Jan 20 01:41:01
Jan 19 15:18:01
Jan 19 04:53:01
Jan 18 18:35:01
Jan 18 08:40:01
Jan 17 22:10:01
Jan 17 11:32:01
Jan 17 01:05:01
Jan 16 14:27:01
Jan 16 04:01:01
Jan 15 17:25:01
Jan 15 08:50:01

"ImportError: No module named pstats" error on Ubuntu

2010-01-21T00:09:17-08:00

Even though the pstats module (used by cProfile) is part of the Python Standard Library, Ubuntu requires installing a separate package because of its non-free license. For more information, see this Ubuntu bug report. (thanks Luke)

Running this on Ubuntu Karmic:

import cProfile
def my_super_slow_routine(): pass
cProfile.run('my_super_slow_routine()')

Produces this error:

Traceback (most recent call last):
  File "test_pstats_error.py", line 3, in 
    cProfile.run('my_super_slow_routine()')
  File "/usr/lib/python2.6/cProfile.py", line 36, in run
    result = prof.print_stats(sort)
  File "/usr/lib/python2.6/cProfile.py", line 80, in print_stats
    import pstats
ImportError: No module named pstats

Solution:

sudo apt-get install python-profiler

Note: the multiverse repository must be enabled via either update-manager > "Settings..." > "Ubuntu Software" or directly editing /etc/apt/sources.list. (thanks Luke)

Wmii Python script to monitor remote machines

2009-12-22T01:00:05-08:00

I like to monitor our web servers by ssh'ing into the remote machine and watching "top", tailing log files, etc. Normally, I open a terminal, ssh into the remote machine, run the monitoring command (e.g. "top"), then repeat for the rest of the remote machines. Then I adjust the window sizes so I can see everything at once.

My window manager, wmii, is great for tiling a bunch of windows at once. It is also scriptable with Python, so I wrote a Python script to create my web server monitoring view. Below is my script. I also put a video on YouTube.

#!/usr/bin/env python

import os
import time

NGINX_MONITOR_CMD = "tail --follow=name /var/log/nginx/cache.log | grep --color -E '(HIT|MISS|EXPIRED|STALE|UPDATING|\*\*\*)'"
APACHE_MONITOR_CMD = "top"
MYSQL_MONITOR_CMD = "mysqladmin extended -i10 -r | grep -i 'questions\|aborted_clients\|opened_tables\|slow_queries\|threads_created' "

CMDS_COL1 = ['urxvt -title "Nginx 1" -e ssh -t us-ng1 "%s" &' % NGINX_MONITOR_CMD,
             'urxvt -title "Nginx 2" -e ssh -t us-ng2 "%s" &' % NGINX_MONITOR_CMD,
             ]
CMDS_COL2 = ['urxvt -title "Apache 1" -e ssh -t us-med1 "%s" &' % APACHE_MONITOR_CMD,
             'urxvt -title "Apache 2" -e ssh -t us-med2 "%s" &' % APACHE_MONITOR_CMD,
             'urxvt -title "Apache 3" -e ssh -t us-med3 "%s" &' % APACHE_MONITOR_CMD,
             ]
CMDS_COL3 = ['urxvt -title "MySQL 1" -e ssh -t us-my1 "%s" &' % MYSQL_MONITOR_CMD,
             'urxvt -title "MySQL 2" -e ssh -t us-my2 "%s" &' % MYSQL_MONITOR_CMD,
             ]
COLUMNS = [CMDS_COL1, CMDS_COL2, CMDS_COL3]

def create_windows():
    for i, col in enumerate(COLUMNS):
        cindex = str(i+1)
        for cmd in col:
            os.system(cmd)
            time.sleep(1)
            os.system('wmiir xwrite /tag/sel/ctl send sel %s' % cindex)
        os.system('wmiir xwrite /tag/sel/ctl colmode %s default-max' % cindex)
    os.system('wmii.py 45.5 31.5 23')

if __name__ == '__main__':
    create_windows()

Note 1: The script above uses another script I wrote previously, wmii.py, to set the column widths.

Note 2: The remote server addresses are specified by the nicknames us-ng1, us-ng2, us-med1, etc. configured in my ~/.ssh/config file as described here.

Note 3 (on using ssh and top): I first tried doing ssh host top, but this gave me a TERM environment variable not set. error. I then tried ssh host "export TERM=rxvt-unicode; top", but this gave me a top: failed tty get error. The solution that worked for me was to use the -t option with ssh. E.g. ssh -t host top. This is what I used in the script above.

Note 4 (added 2010-03-05): I used "tail --follow=name" instead of "tail -f" so that tail will follow the log file even after it has been rotated. For more information, see the man page for tail.

Note 5 (added 2010-03-05): To prevent your ssh session from timing out, add the following 2 lines to your ~/.ssh/config file (via):

Host *
  ServerAliveInterval 60

Trying out a Retry decorator in Python

2009-11-17T17:29:19-08:00

The Python wiki has a Retry decorator example which retries calling a failure-prone function using an exponential backoff algorithm. I modified it slightly to check for exceptions instead of a False return value to indicate failure. Each time the decorated function throws an exception, the decorator will wait a period of time and retry calling the function until the maximum number of tries is used up. If the decorated function fails on the last try, the exception will occur unhandled.

import time
from functools import wraps


def retry(ExceptionToCheck, tries=4, delay=3, backoff=2, logger=None):
    """Retry calling the decorated function using an exponential backoff.

    http://www.saltycrane.com/blog/2009/11/trying-out-retry-decorator-python/
    original from: http://wiki.python.org/moin/PythonDecoratorLibrary#Retry

    :param ExceptionToCheck: the exception to check. may be a tuple of
        exceptions to check
    :type ExceptionToCheck: Exception or tuple
    :param tries: number of times to try (not retry) before giving up
    :type tries: int
    :param delay: initial delay between retries in seconds
    :type delay: int
    :param backoff: backoff multiplier e.g. value of 2 will double the delay
        each retry
    :type backoff: int
    :param logger: logger to use. If None, print
    :type logger: logging.Logger instance
    """
    def deco_retry(f):

        @wraps(f)
        def f_retry(*args, **kwargs):
            mtries, mdelay = tries, delay
            while mtries > 1:
                try:
                    return f(*args, **kwargs)
                except ExceptionToCheck, e:
                    msg = "%s, Retrying in %d seconds..." % (str(e), mdelay)
                    if logger:
                        logger.warning(msg)
                    else:
                        print msg
                    time.sleep(mdelay)
                    mtries -= 1
                    mdelay *= backoff
            return f(*args, **kwargs)

        return f_retry  # true decorator

    return deco_retry

Try an "always fail" case

@retry(Exception, tries=4)
def test_fail(text):
    raise Exception("Fail")

test_fail("it works!")

Results:

Fail, Retrying in 3 seconds...
Fail, Retrying in 6 seconds...
Fail, Retrying in 12 seconds...
Traceback (most recent call last):
  File "retry_decorator.py", line 47, in 
    test_fail("it works!")
  File "retry_decorator.py", line 26, in f_retry
    f(*args, **kwargs)
  File "retry_decorator.py", line 33, in test_fail
    raise Exception("Fail")
Exception: Fail

Try a "success" case

@retry(Exception, tries=4)
def test_success(text):
    print "Success: ", text

test_success("it works!")

Results:

Success:  it works!

Try a "random fail" case

import random

@retry(Exception, tries=4)
def test_random(text):
    x = random.random()
    if x < 0.5:
        raise Exception("Fail")
    else:
        print "Success: ", text

test_random("it works!")

Results:

Fail, Retrying in 3 seconds...
Success:  it works!

Try handling multiple exceptions

Added 2010-04-27

import random

@retry((NameError, IOError), tries=20, delay=1, backoff=1)
def test_multiple_exceptions():
    x = random.random()
    if x < 0.40:
        raise NameError("NameError")
    elif x < 0.80:
        raise IOError("IOError")
    else:
        raise KeyError("KeyError")

test_multiple_exceptions()

Results:

IOError, Retrying in 1 seconds...
NameError, Retrying in 1 seconds...
IOError, Retrying in 1 seconds...
IOError, Retrying in 1 seconds...
NameError, Retrying in 1 seconds...
IOError, Retrying in 1 seconds...
NameError, Retrying in 1 seconds...
NameError, Retrying in 1 seconds...
NameError, Retrying in 1 seconds...
IOError, Retrying in 1 seconds...
Traceback (most recent call last):
  File "retry_decorator.py", line 61, in 
    test_multiple_exceptions("hello")
  File "retry_decorator.py", line 14, in f_retry
    f(*args, **kwargs)
  File "retry_decorator.py", line 56, in test_multiple_exceptions
    raise KeyError("KeyError")
KeyError: 'KeyError'

Unit tests

Added 2013-01-22. Note: Python 2.7 is required to run the tests.

import logging
import unittest

from decorators import retry


class RetryableError(Exception):
    pass


class AnotherRetryableError(Exception):
    pass


class UnexpectedError(Exception):
    pass


class RetryTestCase(unittest.TestCase):

    def test_no_retry_required(self):
        self.counter = 0

        @retry(RetryableError, tries=4, delay=0.1)
        def succeeds():
            self.counter += 1
            return 'success'

        r = succeeds()

        self.assertEqual(r, 'success')
        self.assertEqual(self.counter, 1)

    def test_retries_once(self):
        self.counter = 0

        @retry(RetryableError, tries=4, delay=0.1)
        def fails_once():
            self.counter += 1
            if self.counter < 2:
                raise RetryableError('failed')
            else:
                return 'success'

        r = fails_once()
        self.assertEqual(r, 'success')
        self.assertEqual(self.counter, 2)

    def test_limit_is_reached(self):
        self.counter = 0

        @retry(RetryableError, tries=4, delay=0.1)
        def always_fails():
            self.counter += 1
            raise RetryableError('failed')

        with self.assertRaises(RetryableError):
            always_fails()
        self.assertEqual(self.counter, 4)

    def test_multiple_exception_types(self):
        self.counter = 0

        @retry((RetryableError, AnotherRetryableError), tries=4, delay=0.1)
        def raise_multiple_exceptions():
            self.counter += 1
            if self.counter == 1:
                raise RetryableError('a retryable error')
            elif self.counter == 2:
                raise AnotherRetryableError('another retryable error')
            else:
                return 'success'

        r = raise_multiple_exceptions()
        self.assertEqual(r, 'success')
        self.assertEqual(self.counter, 3)

    def test_unexpected_exception_does_not_retry(self):

        @retry(RetryableError, tries=4, delay=0.1)
        def raise_unexpected_error():
            raise UnexpectedError('unexpected error')

        with self.assertRaises(UnexpectedError):
            raise_unexpected_error()

    def test_using_a_logger(self):
        self.counter = 0

        sh = logging.StreamHandler()
        logger = logging.getLogger(__name__)
        logger.addHandler(sh)

        @retry(RetryableError, tries=4, delay=0.1, logger=logger)
        def fails_once():
            self.counter += 1
            if self.counter < 2:
                raise RetryableError('failed')
            else:
                return 'success'

        fails_once()


if __name__ == '__main__':
    unittest.main()

Code / License

This code is also on github at: https://github.com/saltycrane/retry-decorator. It is BSD licensed.

Notes on switching my Djangos to mod_wsgi

2009-10-22T00:44:17-07:00

I'm slowly trying to make my Django web servers conform to current best practices. I've set up an Nginx reverse proxy for serving static files, started using virtualenv to isolate my Python environments, and migrated my database to PostgreSQL. I ultimately want to implement memcached+Nginx caching in my reverse proxy, but the next task on my to-do list is switching from mod_python to mod_wsgi.

Within the past year (or maybe before), mod_wsgi has become the preferred method for serving Django applications. I also originally thought switching from mod_python to mod_wsgi would save me some much needed memory on my 256MB VPS. But after trying it out, running with a single Apache process in each case, the memory footprint was about the same. Even switching from mod_wsgi's embedded mode to daemon mode didn't make a significant difference. Likely the performance is better with mod_wsgi, though.

Here are my notes on installing mod_wsgi.

Configuration References

Django docs: How to use Django with Apache and mod_wsgi
mod_wsgi docs: Quick Configuration Guide
A good blog article: Configs for nginx and Apache with mod_wsgi

Advice from mod_wsgi author Graham Dumpleton

Install mod_wsgi and apache mpm-worker

I'm not 100% sure about prefork vs. worker mpm, but Graham Dumpleton favors worker mpm.

sudo apt-get install libapache2-mod-wsgi
sudo apt-get install apache2-mpm-worker

Create .wsgi application file

My virtualenv is located at /srv/python-environments/saltycrane. My Django settings files is at /srv/SaltyCrane/iwiwdsmi/settings.py.

/srv/SaltyCrane/saltycrane.wsgi:

import os
import sys
import site

site.addsitedir('/srv/python-environments/saltycrane/lib/python2.5/site-packages')

os.environ['DJANGO_SETTINGS_MODULE'] = 'iwiwdsmi.settings'

sys.path.append('/srv/SaltyCrane')

import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()

Edit Apache's httpd.conf file

I went back and forth between using embedded mode or daemon mode. I've ended up with embedded mode for now since it seems to use a tad less memory and is supposed to be a little bit faster. However, Graham Dumpleton seems to recommend daemon mode for people on VPSs. I may change my mind again later. To use daemon mode, I just need to uncomment the WSGIDaemonProcess and WSGIProcessGroup lines. I have StartServers set to 1 because I can only afford to have one Apache process running. This is assuming nginx is proxying requests to apache. For more on my nginx setup, see here.

Edit /etc/apache2/httpd.conf:

<IfModule mpm_worker_module>
    StartServers 1
    ServerLimit 1
    ThreadsPerChild 5
    ThreadLimit 5
    MinSpareThreads 5
    MaxSpareThreads 5
    MaxClients 5
    MaxRequestsPerChild 500
</IfModule>

KeepAlive Off
NameVirtualHost 127.0.0.1:8080
Listen 8080

<VirtualHost 127.0.0.1:8080>
    ServerName www.saltycrane.com
    # WSGIDaemonProcess saltycrane.com processes=1 threads=5 display-name=%{GROUP}
    # WSGIProcessGroup saltycrane.com
    WSGIScriptAlias / /srv/SaltyCrane/saltycrane.wsgi
</VirtualHost>

<VirtualHost 127.0.0.1:8080>
    ServerName supafu.com
    # WSGIDaemonProcess supafu.com processes=1 threads=5 display-name=%{GROUP}
    # WSGIProcessGroup supafu.com
    WSGIScriptAlias / /srv/Supafu/supafu.wsgi
</VirtualHost>

<VirtualHost 127.0.0.1:8080>
    ServerName handsoncards.com
    # WSGIDaemonProcess handsoncards.com processes=1 threads=5 display-name=%{GROUP}
    # WSGIProcessGroup handsoncards.com
    WSGIScriptAlias / /srv/HandsOnCards/handsoncards.wsgi
</VirtualHost>

Restart Apache

sudo /etc/init.d/apache2 restart

How to capture stdout in real-time with Python

2009-10-12T16:54:59-07:00

This solution is thanks to this article.

import subprocess

def myrun(cmd):
    """from http://blog.kagesenshi.org/2008/02/teeing-python-subprocesspopen-output.html
    """
    p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    stdout = []
    while True:
        line = p.stdout.readline()
        stdout.append(line)
        print line,
        if line == '' and p.poll() != None:
            break
    return ''.join(stdout)

Notes on Python logging

2009-10-07T12:55:11-07:00

mylogging.py:

import logging
import sys

DEBUG_LOG_FILENAME = '/var/log/my-debug.log'
WARNING_LOG_FILENAME = '/var/log/my-warning.log'

# set up formatting
formatter = logging.Formatter('[%(asctime)s] %(levelno)s (%(process)d) %(module)s: %(message)s')

# set up logging to STDOUT for all levels DEBUG and higher
sh = logging.StreamHandler(sys.stdout)
sh.setLevel(logging.DEBUG)
sh.setFormatter(formatter)

# set up logging to a file for all levels DEBUG and higher
fh = logging.FileHandler(DEBUG_LOG_FILENAME)
fh.setLevel(logging.DEBUG)
fh.setFormatter(formatter)

# set up logging to a file for all levels WARNING and higher
fh2 = logging.FileHandler(WARNING_LOG_FILENAME)
fh2.setLevel(logging.WARN)
fh2.setFormatter(formatter)

# create Logger object
mylogger = logging.getLogger('MyLogger')
mylogger.setLevel(logging.DEBUG)
mylogger.addHandler(sh)
mylogger.addHandler(fh)
mylogger.addHandler(fh2)

# create shortcut functions
debug = mylogger.debug
info = mylogger.info
warning = mylogger.warning
error = mylogger.error
critical = mylogger.critical

testlogging.py:

from mylogging import debug, info, warning, error

debug('debug message')
info('info message')
warning('warning message')
error('error message')

Run it:

python testlogging.py

Console output:

[2009-10-07 12:45:59,713] 10 (22886) testlogging: debug message
[2009-10-07 12:45:59,718] 20 (22886) testlogging: info message
[2009-10-07 12:45:59,718] 30 (22886) testlogging: warning message
[2009-10-07 12:45:59,719] 40 (22886) testlogging: error message

cat debug.log:

[2009-10-07 12:45:59,713] 10 (22886) testlogging: debug message
[2009-10-07 12:45:59,718] 20 (22886) testlogging: info message
[2009-10-07 12:45:59,718] 30 (22886) testlogging: warning message
[2009-10-07 12:45:59,719] 40 (22886) testlogging: error message

cat warning.log:

[2009-10-07 12:45:59,718] 30 (22886) testlogging: warning message
[2009-10-07 12:45:59,719] 40 (22886) testlogging: error message

Note: if you get a permission denied error for the log file, you can do this:

sudo touch /var/log/my-debug.log
sudo touch /var/log/my-warning.log
sudo chmod 666 /var/log/my-debug.log
sudo chmod 666 /var/log/my-warning.log

Documentation

Notes on Python Fabric 0.9b1

2009-10-04T22:36:28-07:00

Fabric is a Python package used for deploying websites or generally running commands on a remote server. I first used Fabric about a year ago and thought it was great. Since then, Fabric has procured a new maintainer, a new domain, and a few new revisions.

Here are my notes on installing the latest stable version (0.9b1) on Ubuntu Jaunty and running a simple example.

Install Fabric 0.9b1

Install Easy Install & pip

sudo apt-get install python-setuptools python-dev build-essential

sudo easy_install -U pip

Install Fabric

Note: According to the Fabric website, the latest version of the prerequisite Python library, Paramiko has a bug, so it is recommended to install the previous version, 1.7.4, instead. This can be accomplished by creating a requirements file for pip:
```
http://www.lag.net/paramiko/download/paramiko-1.7.4.tar.gz
http://git.fabfile.org/cgit.cgi/fabric/snapshot/fabric-0.9b1.tar.gz
```
To install, use the pip install command with the -r option and the path to your requirements file. For convenience, you can install Fabric using my requirements file:
```
sudo pip install -r http://www.saltycrane.com/site_media/code/fabric-requirements.txt
```

Using Fabric

Create a file called fabfile.py in ~/myproject:

from __future__ import with_statement # needed for python 2.5
from fabric.api import env, run

def ec2():
    env.hosts = ['ec2-65-234-55-183.compute-1.amazonaws.com']
    env.user = 'saltycrane'
    env.key_filename = '/path/to/my/id_ssh_keyfile'

def ps_apache():
    run('ps -e -O rss,pcpu | grep apache')

Run it

cd ~/myproject
fab ec2 ps_apache

Results:

[ec2-65-234-55-183.compute-1.amazonaws.com] run: ps -e -O rss,pcpu | grep apache
[ec2-65-234-55-183.compute-1.amazonaws.com] err: stdin: is not a tty
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  3571 10996  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  5047 28352  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  5048 27756  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  5049 23752  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  5050 27344  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  5055 27344  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  5166 28404  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  5167 27900  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  9365  1208  0.0 S ?        00:00:00 /bin/bash -l -c ps -e -O rss,pcpu | grep apache

Done.
Disconnecting from ec2-65-234-55-183.compute-1.amazonaws.com... done.

List of available `env` options

I extracted this list from state.py (0.9b1). Or view the tip version

env.reject_unknown_hosts = True         # reject unknown hosts
env.disable_known_hosts = True          # do not load user known_hosts file
env.user = 'username'                   # username to use when connecting to remote hosts
env.password = 'mypassword'             # password for use with authentication and/or sudo
env.hosts = ['host1.com', 'host2.com']  # comma-separated list of hosts to operate on
env.roles = ['web']                     # comma-separated list of roles to operate on
env.key_filename = 'id_rsa'             # path to SSH private key file. May be repeated.
env.fabfile = '../myfabfile.py'         # name of fabfile to load, e.g. 'fabfile.py' or '../other.py'
env.warn_only = True                    # warn, instead of abort, when commands fail
env.shell = '/bin/sh'                   # specify a new shell, defaults to '/bin/bash -l -c'
env.rcfile = 'myfabconfig'              # specify location of config file to use
env.hide = ['everything']               # comma-separated list of output levels to hide
env.show = ['debug']                    # comma-separated list of output levels to show
env.version = '1.0'
env.sudo_prompt = 'sudo password:'
env.use_shell = False
env.roledefs = {'web': ['www1', 'www2', 'www3'],
                'dns': ['ns1', 'ns2'],
                }
env.cwd = 'mydir'

How to check the status code of a command

To check the return code of your command, set the env.warn_only option to True and check the return_code attribute of object returned from run(). For example:

def ec2():
    env.hosts = ['ec2-65-234-55-183.compute-1.amazonaws.com']
    env.user = 'saltycrane'
    env.key_filename = '/path/to/my/id_ssh_keyfile'
    env.warn_only = True

def getstatus():
    output = run('ls non_existent_file')
    print 'output:', output
    print 'failed:', output.failed
    print 'return_code:', output.return_code

fab ec2 getstatus

[ec2-65-234-55-183.compute-1.amazonaws.com] run: ls non_existent_file
[ec2-65-234-55-183.compute-1.amazonaws.com] err: ls: cannot access non_existent_file: No such file or directory

Warning: run() encountered an error (return code 2) while executing 'ls non_existent_file'

output:
failed: True
return_code: 2

Done.
Disconnecting from ec2-65-234-55-183.compute-1.amazonaws.com... done.

Other notes

Error message: paramiko.SSHException: Channel closed.

Try using Paramiko version 1.7.4 instead of 1.7.5. See http://www.mail-archive.com/fab-user@nongnu.org/msg00844.html.

How to check the version of Paramiko:

$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import paramiko
>>> paramiko.__version__
'1.7.5 (Ernest)'

Error message: Fatal error: No existing session

This occurred when I used the wrong username.

Notes on working with files and directories in Python

2009-09-18T16:53:07-07:00

Documentation:

How to list files in a directory

See my separate post: How to list the contents of a directory with Python

How to rename a file: os.rename

Documentation: http://docs.python.org/library/os.html#os.rename

import os
os.rename("/tmp/oldname", "/tmp/newname")

How to imitate mkdir -p

import os
if not os.path.exists(directory):
    os.makedirs(directory)

How to imitate cp -r (except copy only files including hidden dotfiles)

What didn't work for my purpose:

import os

def _copy_dash_r_filesonly(src, dst):
    """Like "cp -r src/* dst" but copy files only (don't include directories)
    (and include hidden dotfiles also)
    """
    for (path, dirs, files) in os.walk(src):
        for filename in files:
            srcfilepath = os.path.join(path, filename)
            dstfilepath = os.path.join(dst, os.path.relpath(srcfilepath, src))
            dstdir = os.path.dirname(dstfilepath)
            if not os.path.exists(dstdir):
                run('mkdir -p %s' % dstdir)
            run('cp -f %s %s' % (srcfilepath, dstfilepath))

Python optparse example

2009-09-16T00:04:39-07:00

Optparse seems like a pretty cool module for processing command line options and arguments in Python. It is intended to be an improvement over the old getopt module. Optparse supports short style options like -x, long style options like --xhtml and positional arguments. Optparse also makes it easy to add default options and help text. Below is my optparse example. For more information, see the optparse documentation.

Here is my example file, optparse_ex.py:

#!/usr/bin/env python

from optparse import OptionParser

def main():
    parser = OptionParser(usage="usage: %prog [options] filename",
                          version="%prog 1.0")
    parser.add_option("-x", "--xhtml",
                      action="store_true",
                      dest="xhtml_flag",
                      default=False,
                      help="create a XHTML template instead of HTML")
    parser.add_option("-c", "--cssfile",
                      action="store", # optional because action defaults to "store"
                      dest="cssfile",
                      default="style.css",
                      help="CSS file to link",)
    (options, args) = parser.parse_args()

    if len(args) != 1:
        parser.error("wrong number of arguments")

    print options
    print args

if __name__ == '__main__':
    main()

Note: If you do not specify a dest variable, optparse will use the long option name you specify (removing the initial two dashes and substituting underscores for dashes in the middle of the name). If you do not specify a long option, it will use the short option.

displaying the help message

./optparse_ex.py -h

Usage: optparse_ex.py [options] filename

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -x, --xhtml           create a XHTML template instead of HTML
  -c CSSFILE, --cssfile=CSSFILE
                        CSS file to link

wrong number of arguments

./optparse_ex.py

Usage: optparse_ex.py [options] filename

optparse_ex.py: error: wrong number of arguments

using default options

./optparse_ex.py myfile.html

{'xhtml_flag': False, 'cssfile': 'style.css'}
['myfile.html']

specifying command line options

./optparse_ex.py -x -c mystyle.css myfile.html

{'xhtml_flag': True, 'cssfile': 'mystyle.css'}
['myfile.html']

specifying long style command line options

./optparse_ex.py --xhtml --cssfile=mystyle.css myfile.html

{'xhtml_flag': True, 'cssfile': 'mystyle.css'}
['myfile.html']

How to specify a discrete list of choices for an option

from optparse import OptionParser

def main():
    parser = OptionParser(usage='usage: %prog [options] ')
    parser.add_option('-e', '--env',
                      type='choice',
                      action='store',
                      dest='environment',
                      choices=['production', 'staging', 'testing',],
                      default='production',
                      help='Environment to run on',)
    (options, args) = parser.parse_args()

Notes on migrating this blog from SQLite to PostgreSQL using Django

2009-08-07T23:54:14-07:00

Here are my notes on migrating this blog from SQLite to PostgreSQL. For the parts of my database that were in proper order, the migration was made very easy using Django's ./manage.py dumpdata and ./manage.py loaddata commands. However, the database tables used for storing the comments on this blog were kind of screwed up because I had previously migrated them from my old Blogger blog. So I had to write another (not so pretty) script for that.

Thanks to this article for showing me how to use Django's dumpdata and loaddata for this migration.

Create a new Postgres database

See my previous notes for creating a Postgres database. I named the new database "saltycrane_db" owned by "django_user" with password "my_password".

Clone and modify Django project

Clone my SaltyCrane project

hg clone SaltyCrane SaltyCraneNew

cp -rp SaltyCrane SaltyCraneNew

Edit settings.py:

DATABASE_ENGINE = 'postgresql_psycopg2'
DATABASE_NAME = 'saltycrane_db'
DATABASE_USER = 'django_user'
DATABASE_PASSWORD = my_password'

Create database tables
```
python manage.py syncdb
```

Migrate data

Create JSON dumps from the existing SQLite database for my 3 Django apps (myblogapp, comments, and tagging):

cd /srv/SaltyCrane/iwiwdsmi
./manage.py dumpdata myblogapp > dump_myblogapp.json
./manage.py dumpdata comments > dump_comments.json
./manage.py dumpdata tagging > dump_tagging.json

Load the JSON data dumps to the new Postgres database:

cd /srv/SaltyCraneNew/iwiwdsmi
./manage.py loaddata ../../SaltyCrane/iwiwdsmi/dump_tagging.json
./manage.py loaddata ../../SaltyCrane/iwiwdsmi/dump_myblogapp.json

However, loading comments didn't work because I had some missing fields so...

Migration script

...I wrote a migration script:

import setup_environment

import simplejson
from pprint import pprint
from django.contrib.comments.models import Comment
from django.contrib.contenttypes.models import ContentType
from iwiwdsmi.myblogapp.models import Post

JSON_FILENAME = "/srv/SaltyCrane/iwiwdsmi/dump_comments.json"
ct = ContentType.objects.get(name='post', app_label='myblogapp')
N_OLD_COMMENTS = 4000

def main():
    delete_all_comments()
    create_dummy_comments()
    pydata = open_json_file(JSON_FILENAME)
    save_items_to_database(pydata)
    delete_dummy_comments()

def delete_all_comments():
    """ Clears the database of all comments.
    """
    comments = Comment.objects.all()
    comments.delete()
    print "All comments deleted."

def create_dummy_comments():
    """ Create a bunch of filler dummy comments
    """
    for i in range(N_OLD_COMMENTS):
        c = Comment()
        c.comment = "Filler comment."
        c.content_type = ct
        c.ip_address = None
        c.is_public = False
        c.is_removed = False
        c.object_pk = 243
        c.site_id = 1
        c.user_email = "filler@email.com"
        c.user_name = "Filler"
        c.save()
    print "Filler comments created."

def delete_dummy_comments():
    comments = Comment.objects.filter(is_public=False)
    comments.delete()
    print "Dummy comments deleted."

def open_json_file(filename):
    """ open the json file and return the python data structure
    """
    json_fp = open(filename)
    pydata = simplejson.load(json_fp)
    return pydata

def save_items_to_database(pydata):
    """ Process the python data structure and save to the new database
    """
    for item in pydata:
        pprint(item)
        pk = item['pk']
        item = item['fields']
        if item['is_public'] and not item['is_removed']:
            c = Comment.objects.get(id=pk)
            c.comment = item['comment']
            c.content_type = ct
            c.ip_address = get_ip_address(item['ip_address'])
            c.is_public = item['is_public']
            c.is_removed = item['is_removed']
            c.object_pk = item['object_pk']
            c.site_id = 1
            c.submit_date = item['submit_date']
            c.user_id = item['user']
            c.user_email = item['user_email']
            c.user_name = item['user_name']
            c.user_url = item['user_url']
            c.save()

def get_ip_address(ip):
    """ Handle bad input for IP addresses
    """
    if ip == "" or ip == "unknown":
        return None
    else:
        return ip

if __name__ == '__main__':
    main()

Using psycopg2 with virtualenv on Ubuntu <del>Jaunty</del>Maverick

2009-07-31T22:56:10-07:00

Update 2009-11-02: Well I am dumb-- psycopg2 can be installed with pip/easy_install. The reason I got the error before was because I didn't have the required dependencies installed. On Ubuntu, I used apt-get build-dep. So, here's the summary:

Update 2009-11-11: My update doesn't work. See comments #4 and #5 below.

Update 2010-05-17: Here is what finally worked for me using Python 2.6 on Ubuntu 10.04 Lucid Lynx using virtualenv 1.4.8 and pip 0.7.1. Thanks to Daniel for the final piece of the solution.

Update 2011-05-13: Looks like the mx+virtualenv issue has been reported and fixed to be released in 2.4.2. via @psycopg's tweet

Update 2011-11-05: psycopg2 2.4.2 has been released with the mxDateTime fix so the install is very easy now.

Install dependencies

$ sudo apt-get build-dep python-psycopg2

Install pyscopg2 in a virtualenv

$ virtualenv --no-site-packages myenv 
$ source myenv/bin/activate 
$ pip install psycopg2

No longer needed: ~~$ easy_install -i http://downloads.egenix.com/python/index/ucs4/ egenix-mx-base~~

*** THE REST OF THIS POST DESCRIBES MY INITIAL OUTDATED SOLUTION. ***

I recently starting using virtualenv for creating isolated Python environments. Virtualenv has made it easy to manage different Python configurations for different websites, but I was slowed down a bit trying to use psycopg2, the Python-PostgreSQL database adapter, with virtualenv after upgrading to Ubuntu 9.04 Jaunty Jackalope.

Currently, virtualenv (1.3.3) doesn't find Ubuntu Jaunty's Python packages. In Ubuntu Jaunty, the default Python version changed from 2.5 to 2.6. More importantly, the site-packages directory was changed to dist-packages. Prior to Ubuntu 9.04, Ubuntu's Python packages, such as python-psycopg2, were installed to /usr/lib/python2.5/site-packages. Now Ubuntu's Python packages are installed to /usr/lib/python2.6/dist-packages. (See this discussion at the virtualenv group for more information.)

As a result of this change, virtualenv (as of 1.3.3) doesn't find Ubuntu's Python packages installed using apt-get. My solution was to create symlinks to the desired packages and egg-info files in site-packages. I'm not sure if this is the proper way to handle this. If there is a better solution, please let me know. One advantage of using this method is that I don't need to clutter my virtualenv with all the packages that have accumulated in my global site-packagesdist-packages.

Install easy_install, pip, and virtualenv

sudo apt-get install python-setuptools python-dev build-essential
sudo easy_install -U pip
sudo pip install -U virtualenv

Install Ubuntu's psycopg2 package

sudo apt-get install python-psycopg2

Symlink the psycopg2 (and mxDateTime) files

sudo mkdir /usr/lib/python2.6/site-packages
sudo ln -s /usr/lib/python2.6/dist-packages/psycopg2 /usr/lib/python2.6/site-packages
sudo ln -s /usr/lib/python2.6/dist-packages/psycopg2-2.0.8.egg-info /usr/lib/python2.6/site-packages
sudo ln -s /usr/lib/python2.6/dist-packages/mx /usr/lib/python2.6/site-packages

Create a virtualenv

virtualenv myenv

Check what's available

pip freeze -E myenv

Results:

psycopg2==2.0.8
wsgiref==0.1.2

Note: you might wonder why I didn't do a pip install -E myenv psycopg2. I tried this, but got an error. Maybe psycopg2 doesn't support pip/easy_install? Here is my error message:

Downloading/unpacking psycopg2
  Downloading psycopg2-2.0.11.tar.gz (255Kb): 255Kb downloaded
  Running setup.py egg_info for package psycopg2
    error: No such file or directory
    Complete output from command python setup.py egg_info:
    running egg_info

creating pip-egg-info/psycopg2.egg-info

writing pip-egg-info/psycopg2.egg-info/PKG-INFO

writing top-level names to pip-egg-info/psycopg2.egg-info/top_level.txt

writing dependency_links to pip-egg-info/psycopg2.egg-info/dependency_links.txt

writing manifest file 'pip-egg-info/psycopg2.egg-info/SOURCES.txt'

warning: manifest_maker: standard file '-c' not found

error: No such file or directory

----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in ./pip-log.txt
Complete output from command temp/bin/python /usr/local/lib/python2.6/dist-packages/pip-0.4-py2.6.egg/pip.py install -E temp psycopg2 temp ___VENV_RESTART___:

----------------------------------------
Traceback (most recent call last):
  File "/usr/local/bin/pip", line 3, in 
    pip.main()
  File "/usr/local/lib/python2.6/dist-packages/pip-0.4-py2.6.egg/pip.py", line 926, in main
    return command.main(initial_args, args[1:], options)
  File "/usr/local/lib/python2.6/dist-packages/pip-0.4-py2.6.egg/pip.py", line 258, in main
    restart_in_venv(options.venv, site_packages, complete_args)
  File "/usr/local/lib/python2.6/dist-packages/pip-0.4-py2.6.egg/pip.py", line 1009, in restart_in_venv
    call_subprocess([python, file] + args + [base, '___VENV_RESTART___'])
  File "/usr/local/lib/python2.6/dist-packages/pip-0.4-py2.6.egg/pip.py", line 3643, in call_subprocess
    % (command_desc, proc.returncode))
pip.InstallationError: Command temp/bin/python /usr/local/lib/python2.6/dist-packages/pip-0.4-py2.6.egg/pip.py install -E temp psycopg2 temp ___VENV_RESTART___ failed with error code 1

Install wmii snapshot 20090703 on Ubuntu Jaunty

2009-07-13T09:30:34-07:00

On Friday I installed the latest snapshot of wmii, the lightweight, tiling, dynamic, scriptable, keyboard navigatable window manager. (See my wmii tag for more posts about wmii.) Wmii orangizes windows into columns. Traditionally there has been three modes for each column: default, stack, and max. "Default" shows all windows in a column. "Stack" shows only one window at a time with the title bars of the other windows showing. "Max" shows only one window without any other titlebars. I usually use two columns each in stack mode. To navigate windows, I use Mod4+j/k to move among windows in a column and Mod4+h/l to move between columns. This has worked well for me, but sometimes I wanted to display two windows in a column while keeping the rest of the windows hidden. Thankfully, the latest snapshot of wmii now includes a new mode which solves my problem. It is a hybrid of the "default" and "stack" modes. It is called "defalut-max" and it allows for multiple stacks of windows per column. There is also a "default+max" mode which is like a hybrid of the old "default" and "max" modes. Additionally these modes now apply to the floating layer as well.

Another really cool thing about the latest wmii snapshot is that it now includes a Python library to interface with the wmii filesystem and a wmiirc written in Python. This should make it a lot easier to script wmii. This is exciting stuff. Thanks to Kris Maglione and the wmii contributors.

Here are my notes for installing wmii snapshot 20090703 on Ubuntu 9.04 Jaunty.

Install prerequisites

$ sudo apt-get install build-essential xorg-dev

$ sudo apt-get install dwm-tools

This is so it can find the freetype headers.

$ sudo ln -s /usr/include/freetype2/freetype /usr/include/freetype

Install wmii

Download wmii+ixp-snap20090703.tgz from the wmii homepage and save it to ~/incoming.

Unpack

$ cd ~/incoming
$ tar -zxvf wmii+ixp-snap20090703.tgz

make config
```
$ cd ~/incoming/wmii+ixp-snap20090703
$ make config
```
Accept all defaults except set PREFIX=/home/eliot/lib/wmii_20090703
make & make install
```
$ make 
$ make install
```

Create links in my ~/bin directory:

$ cd ~/bin
$ ln -s ~/lib/wmii_20090703/bin/* .

Use the fun Python wmiirc

Copy the Python files to ~/.wmii-hg. This includes the pygmi directory, the pyxp directory, and the two wmiirc files.
```
$ cp -r ~/incoming/wmii+ixp-snap20090703/alternative_wmiircs/python/* ~/.wmii-hg/
```
Edit ~/.wmii-hg/wmiirc.py as desired.

Run wmii

Create ~/.xsession:

xmodmap ~/.Xmodmap
gnome-screensaver&
gnome-power-manager&
nm-applet --sm-disable&
urxvt&

until wmii; do
    true
done

Log out of the current window manager and then select X client script as the session and log in.

Install `slock` screen locker

Download from http://tools.suckless.org/slock

$ tar zxvf slock-0.9.tar.gz 
$ cd slock-0.9 
$ sudo make clean install

Running slock: Simply invoke the 'slock' command. To get out of it, enter your password.

Freetype header error

The first time around, I got the following error:

In file included from /usr/include/X11/Xft/Xft.h:39,
                 from ../include/x11.h:9,
                 from cmd/wmii9menu.c:49:
/usr/include/ft2build.h:56:38: error: freetype/config/ftheader.h: No such file or directory
In file included from ../include/x11.h:9,
                 from cmd/wmii9menu.c:49:
/usr/include/X11/Xft/Xft.h:40:10: error: #include expects "FILENAME" or 
In file included from ../include/x11.h:9,
                 from cmd/wmii9menu.c:49:
/usr/include/X11/Xft/Xft.h:60: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘_XftFTlibrary’
/usr/include/X11/Xft/Xft.h:94: error: expected specifier-qualifier-list before ‘FT_UInt’
/usr/include/X11/Xft/Xft.h:101: error: expected specifier-qualifier-list before ‘FT_UInt’
/usr/include/X11/Xft/Xft.h:198: error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
/usr/include/X11/Xft/Xft.h:303: error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
/usr/include/X11/Xft/Xft.h:362: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘XftLockFace’
/usr/include/X11/Xft/Xft.h:401: error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
/usr/include/X11/Xft/Xft.h:407: error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
/usr/include/X11/Xft/Xft.h:416: error: expected declaration specifiers or ‘...’ before ‘FT_UInt’
/usr/include/X11/Xft/Xft.h:417: error: expected declaration specifiers or ‘...’ before ‘FT_UInt’
/usr/include/X11/Xft/Xft.h:426: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘XftCharIndex’
/usr/include/X11/Xft/Xft.h:459: error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
make[1]: *** [wmii9menu.o] Error 1
make: *** [dall] Error 2

Per this discussion, I solved the problem by creating a symbolic link:

$ sudo ln -s /usr/include/freetype2/freetype /usr/include/freetype

Find the N longest lines in a file with Python

2009-06-28T21:26:16-07:00

Here's a Python problem I attempted recently:

Write a program to read a multiple line text file and write the N longest lines to a new file. Where N and the file to be read are specified on the command line. Optimization is important.

Here's my solution:

import sys

def main(filename=sys.argv[1], 
         N=int(sys.argv[2])):
    """Find the N longest lines in filename and write to filename + ".new"
    """
    lines = open(filename).readlines()
    lines.sort(cmp=lambda x,y: cmp(len(y), len(x)))
    open(filename+".new", "w").write("".join(lines[:N]))

if __name__ == '__main__':
    main()

What do you think? Is there a faster way?

Notes on using pip and virtualenv with Django

2009-05-06T23:18:37-07:00

I have been using a symlinking method to install Python packages up to this point. To better handle dependencies and multiple versions I have wanted to switch over to pip and virtualenv. Pip is a better alternative to Easy Install and virtualenv is a tool to create isolated Python environments. I have wanted to use pip and virtualenv for a long time now. Finally, today, I took my first steps and created an environment with the Python packages required for this blog. My notes are below. (I am running Ubuntu ~~IntrepidKarmic~~Maverick and Python ~~2.5~~2.6.) A lot of my notes on virtualenv are taken from Arthur Koziel's excellent tutorial.

Update 2012-03-14: Updated examples for pip 1.1 and virtualenv 1.7. As of pip 1.1, the -E option is removed. As of virtualenv 1.7, the --no-site-packages has become the default and is deprecated. Use --system-site-packages if you want to include system site packages (the old default behavior). Examples are run on Ubuntu 10.10 Maverick Meerkat.

Install Easy Install

If you don't already have Easy Install, it can be installed as follows:

$ sudo apt-get install python-setuptools python-dev build-essential

Install pip

In most cases it is not necessary to install pip because it is included with virtualenv.

$ sudo easy_install -U pip 
install_dir /usr/local/lib/python2.6/dist-packages/
Searching for pip
Reading http://pypi.python.org/simple/pip/
Reading http://pip.openplans.org
Reading http://www.pip-installer.org
Best match: pip 1.1
Downloading http://pypi.python.org/packages/source/p/pip/pip-1.1.tar.gz#md5=62a9f08dd5dc69d76734568a6c040508
Processing pip-1.1.tar.gz
Running pip-1.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-i_5nEU/pip-1.1/egg-dist-tmp-ytHsCZ
warning: no files found matching '*.html' under directory 'docs'
warning: no previously-included files matching '*.txt' found under directory 'docs/_build'
no previously-included directories found matching 'docs/_build/_sources'
Adding pip 1.1 to easy-install.pth file
Installing pip script to /usr/local/bin
Installing pip-2.6 script to /usr/local/bin

Installed /usr/local/lib/python2.6/dist-packages/pip-1.1-py2.6.egg
Processing dependencies for pip
Finished processing dependencies for pip

Install virtualenv

$ sudo easy_install -U virtualenv 
install_dir /usr/local/lib/python2.6/dist-packages/
Searching for virtualenv
Reading http://pypi.python.org/simple/virtualenv/
Reading http://www.virtualenv.org
Reading http://virtualenv.openplans.org
Best match: virtualenv 1.7.1.2
Downloading http://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.7.1.2.tar.gz#md5=3be8a014c27340f48b56465f9109d9fa
Processing virtualenv-1.7.1.2.tar.gz
Running virtualenv-1.7.1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-td3AoM/virtualenv-1.7.1.2/egg-dist-tmp-7dJVfO
warning: no previously-included files matching '*.*' found under directory 'docs/_templates'
Adding virtualenv 1.7.1.2 to easy-install.pth file
Installing virtualenv script to /usr/local/bin

Installed /usr/local/lib/python2.6/dist-packages/virtualenv-1.7.1.2-py2.6.egg
Processing dependencies for virtualenv
Finished processing dependencies for virtualenv

Create a virtual environment

This command creates the isolated virtual environment.

Update 2010-04-13: Updated to use distribute because it is the new hotness.

$ cd /tmp

$ virtualenv --distribute myenv
New python executable in myenv/bin/python
Installing distribute.............................................................................................................................................................................................done.
Installing pip...............done.

Create a virtualenv with a different version of Python

This creates a virtualenv that uses Python 2.7 instead of the default Python 2.6.

$ virtualenv --distribute --python=/usr/bin/python2.7 myenv-py27
Running virtualenv with interpreter /usr/bin/python2.7
New python executable in myenv-py27/bin/python2.7
Also creating executable in myenv-py27/bin/python
Installing distribute.............................................................................................................................................................................................done.
Installing pip...............done.

Clear the PYTHONPATH variable

I don't know if this is necessary, but I had a problem with the akismet module when this was set. Maybe I did something wrong, but when I cleared PYTHONPATH, it worked.

$ export PYTHONPATH=

Install a package (Yolk) in the new virtual environment

Activate the virtual environment and install Yolk inside it. (Yolk is a tool that lists Python packages.)

$ source /tmp/myenv/bin/activate 
$ pip install yolk 
Downloading/unpacking yolk
  Downloading yolk-0.4.3.tar.gz (86Kb): 86Kb downloaded
  Running setup.py egg_info for package yolk
    warning: no files found matching '*.txt' under directory 'tests'
    warning: no files found matching '*.conf' under directory 'docs'
    warning: no files found matching '*.css_t' under directory 'docs'
    warning: no files found matching 'indexsidebar.html' under directory 'docs'
    warning: no files found matching 'tests/test_cli.py'
Requirement already satisfied (use --upgrade to upgrade): distribute in ./myenv/lib/python2.6/site-packages/distribute-0.6.24-py2.6.egg (from yolk)
Installing collected packages: yolk
  Running setup.py install for yolk
    warning: no files found matching '*.txt' under directory 'tests'
    warning: no files found matching '*.conf' under directory 'docs'
    warning: no files found matching '*.css_t' under directory 'docs'
    warning: no files found matching 'indexsidebar.html' under directory 'docs'
    warning: no files found matching 'tests/test_cli.py'
    Installing yolk script to /tmp/myenv/bin
Successfully installed yolk
Cleaning up...

Use the virtual environment

Activate the virtual environment
```
$ source /tmp/myenv/bin/activate
```

Run yolk to list your installed packages:

$ yolk -l
Python          - 2.6.6        - active development (/usr/lib/python2.6/lib-dynload)
distribute      - 0.6.24       - active 
pip             - 1.1          - active 
wsgiref         - 0.1.2        - active development (/usr/lib/python2.6)
yolk            - 0.4.3        - active

Deactivate the environment
```
$ deactivate
```
Try running yolk
```
$ yolk -l
yolk: command not found
```

Install Django in the virtual environment

$ source /tmp/myenv/bin/activate
$ pip install Django
Downloading/unpacking Django
  Downloading Django-1.3.1.tar.gz (6.5Mb): 6.5Mb downloaded
  Running setup.py egg_info for package Django
Installing collected packages: Django
  Running setup.py install for Django
    changing mode of build/scripts-2.6/django-admin.py from 644 to 755
    changing mode of /tmp/myenv/bin/django-admin.py to 755
Successfully installed Django
Cleaning up...

Create a requirements file from existing library versions using pip freeze

The pip freeze command allows you to take a snapshot of the exact versions of all your Python libraries. For more information, see the documentation on pip freeze.

$ source /tmp/myenv/bin/activate
$ pip freeze > /tmp/requirements.txt

$ cat /tmp/requirements.txt 
Django==1.3.1
distribute==0.6.24
wsgiref==0.1.2
yolk==0.4.3

Uninstall a package

$ source /tmp/myenv/bin/activate
$ pip uninstall Django
Uninstalling Django:
  /tmp/myenv/bin/django-admin.py
  /tmp/myenv/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg-info
  /tmp/myenv/lib/python2.6/site-packages/django
Proceed (y/n)? y
  Successfully uninstalled Django

Install libraries based on a requirements file

Once you have a requirements file, you can use pip to install the exact versions of the libraries specified in your requirements file. For more information, see the documentation on pip requirements files. Here's my requirements file for this blog, /tmp/saltycrane-requirements.txt:

psycopg2==2.4.2
Django==1.3.1
Markdown==2.0
http://www.crummy.com/software/BeautifulSoup/download/3.x/BeautifulSoup-3.0.7a.tar.gz
Pygments==1.3.1
Twisted==10.0.0

http://django-tagging.googlecode.com/files/django-tagging-0.3.1.tar.gz
-e hg+http://bitbucket.org/ubernostrum/django-contact-form/#egg=django-contact-form
-e hg+http://bitbucket.org/jezdez/akismet/#egg=akismet

Fabric==1.3.1

Here's the command to install using my reqirements file:

$ source /tmp/myenv/bin/activate
$ pip install -r /tmp/saltycrane-requirements.txt
Downloading/unpacking http://www.crummy.com/software/BeautifulSoup/download/3.x/BeautifulSoup-3.0.7a.tar.gz (from -r /tmp/saltycrane-requirements.txt (line 7))
  Downloading BeautifulSoup-3.0.7a.tar.gz
  Running setup.py egg_info for package from http://www.crummy.com/software/BeautifulSoup/download/3.x/Beaut
ifulSoup-3.0.7a.tar.gz                                           
Downloading/unpacking http://django-tagging.googlecode.com/files/django-tagging-0.3.1.tar.gz (from -r /tmp/saltycrane-requirements.txt (line 11))
  Downloading django-tagging-0.3.1.tar.gz
  Running setup.py egg_info for package from http://django-tagging.googlecode.com/files/django-tagging-0.3.1.tar.gz
Downloading/unpacking psycopg2==2.4.2 (from -r /tmp/saltycrane-requirements.txt (line 1))
  Downloading psycopg2-2.4.2.tar.gz (667Kb): 667Kb downloaded
  Running setup.py egg_info for package psycopg2
    no previously-included directories found matching 'doc/src/_build'
Downloading/unpacking Django==1.3.1 (from -r /tmp/saltycrane-requirements.txt (line 4))
  Downloading Django-1.3.1.tar.gz (6.5Mb): 6.5Mb downloaded
  Running setup.py egg_info for package Django
Downloading/unpacking Markdown==2.0 (from -r /tmp/saltycrane-requirements.txt (line 5))
  Downloading Markdown-2.0.zip (93Kb): 93Kb downloaded
  Running setup.py egg_info for package Markdown
Downloading/unpacking Pygments==1.3.1 (from -r /tmp/saltycrane-requirements.txt (line 8))
  Downloading Pygments-1.3.1.tar.gz (1.1Mb): 1.1Mb downloaded
  Running setup.py egg_info for package Pygments
Downloading/unpacking Twisted==10.0.0 (from -r /tmp/saltycrane-requirements.txt (line 9))
  Downloading Twisted-10.0.0.tar.bz2 (2.6Mb): 2.6Mb downloaded
  Running setup.py egg_info for package Twisted
Obtaining django-contact-form from hg+http://bitbucket.org/ubernostrum/django-contact-form/#egg=django-contact-form (from -r /tmp/saltycrane-requirements.txt (line 12))
  Cloning hg http://bitbucket.org/ubernostrum/django-contact-form/ to ./myenv/src/django-contact-form
  Running setup.py egg_info for package django-contact-form
Obtaining akismet from hg+http://bitbucket.org/jezdez/akismet/#egg=akismet (from -r /tmp/saltycrane-requirements.txt (line 13))
  Cloning hg http://bitbucket.org/jezdez/akismet/ to ./myenv/src/akismet
  Running setup.py egg_info for package akismet
Downloading/unpacking Fabric==1.3.1 (from -r /tmp/saltycrane-requirements.txt (line 15))
  Downloading Fabric-1.3.1.tar.gz (167Kb): 167Kb downloaded
  Running setup.py egg_info for package Fabric
    warning: no previously-included files matching '*' found under directory 'docs/_build'
    warning: no files found matching 'fabfile.py'
Downloading/unpacking zope.interface (from Twisted==10.0.0->-r /tmp/saltycrane-requirements.txt (line 9))
  Downloading zope.interface-3.8.0.tar.gz (111Kb): 111Kb downloaded
  Running setup.py egg_info for package zope.interface
Downloading/unpacking ssh>=1.7.8 (from Fabric==1.3.1->-r /tmp/saltycrane-requirements.txt (line 15))
  Downloading ssh-1.7.13.tar.gz (790Kb): 790Kb downloaded
  Running setup.py egg_info for package ssh
Requirement already satisfied (use --upgrade to upgrade): distribute in ./myenv/lib/python2.6/site-packages/distribute-0.6.24-py2.6.egg (from zope.interface->Twisted==10.0.0->-r /tmp/saltycrane-requirements.txt (line 9))
Downloading/unpacking pycrypto>=2.1,!=2.4 (from ssh>=1.7.8->Fabric==1.3.1->-r /tmp/saltycrane-requirements.txt (line 15))
  Downloading pycrypto-2.5.tar.gz (426Kb): 426Kb downloaded
  Running setup.py egg_info for package pycrypto
Installing collected packages: psycopg2, Django, Markdown, Pygments, Twisted, django-contact-form, akismet, Fabric, BeautifulSoup, django-tagging, zope.interface, ssh, pycrypto
  Running setup.py install for psycopg2
    building 'psycopg2._psycopg' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/psycopgmodule.c -o build/temp.linux-x86_64-2.6/psycopg/psycopgmodule.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/green.c -o build/temp.linux-x86_64-2.6/psycopg/green.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/pqpath.c -o build/temp.linux-x86_64-2.6/psycopg/pqpath.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/utils.c -o build/temp.linux-x86_64-2.6/psycopg/utils.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/bytes_format.c -o build/temp.linux-x86_64-2.6/psycopg/bytes_format.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/connection_int.c -o build/temp.linux-x86_64-2.6/psycopg/connection_int.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/connection_type.c -o build/temp.linux-x86_64-2.6/psycopg/connection_type.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/cursor_int.c -o build/temp.linux-x86_64-2.6/psycopg/cursor_int.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/cursor_type.c -o build/temp.linux-x86_64-2.6/psycopg/cursor_type.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/lobject_int.c -o build/temp.linux-x86_64-2.6/psycopg/lobject_int.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/lobject_type.c -o build/temp.linux-x86_64-2.6/psycopg/lobject_type.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/notify_type.c -o build/temp.linux-x86_64-2.6/psycopg/notify_type.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/xid_type.c -o build/temp.linux-x86_64-2.6/psycopg/xid_type.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/adapter_asis.c -o build/temp.linux-x86_64-2.6/psycopg/adapter_asis.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/adapter_binary.c -o build/temp.linux-x86_64-2.6/psycopg/adapter_binary.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/adapter_datetime.c -o build/temp.linux-x86_64-2.6/psycopg/adapter_datetime.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/adapter_list.c -o build/temp.linux-x86_64-2.6/psycopg/adapter_list.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/adapter_pboolean.c -o build/temp.linux-x86_64-2.6/psycopg/adapter_pboolean.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/adapter_pdecimal.c -o build/temp.linux-x86_64-2.6/psycopg/adapter_pdecimal.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/adapter_pint.c -o build/temp.linux-x86_64-2.6/psycopg/adapter_pint.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/adapter_pfloat.c -o build/temp.linux-x86_64-2.6/psycopg/adapter_pfloat.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/adapter_qstring.c -o build/temp.linux-x86_64-2.6/psycopg/adapter_qstring.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/microprotocols.c -o build/temp.linux-x86_64-2.6/psycopg/microprotocols.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/microprotocols_proto.c -o build/temp.linux-x86_64-2.6/psycopg/microprotocols_proto.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/typecast.c -o build/temp.linux-x86_64-2.6/psycopg/typecast.o -Wdeclaration-after-statement
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DHAVE_MXDATETIME=1 -DPSYCOPG_DEFAULT_PYDATETIME=1 -DPSYCOPG_VERSION="2.4.2 (dt dec mx pq3 ext)" -DPG_VERSION_HEX=0x08040B -DPSYCOPG_EXTENSIONS=1 -DPSYCOPG_NEW_BOOLEAN=1 -DHAVE_PQFREEMEM=1 -I/usr/include/python2.6/mx -I/usr/include/python2.6 -I. -I/usr/include/postgresql -I/usr/include/postgresql/8.4/server -c psycopg/adapter_mxdatetime.c -o build/temp.linux-x86_64-2.6/psycopg/adapter_mxdatetime.o -Wdeclaration-after-statement
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/psycopg/psycopgmodule.o build/temp.linux-x86_64-2.6/psycopg/green.o build/temp.linux-x86_64-2.6/psycopg/pqpath.o build/temp.linux-x86_64-2.6/psycopg/utils.o build/temp.linux-x86_64-2.6/psycopg/bytes_format.o build/temp.linux-x86_64-2.6/psycopg/connection_int.o build/temp.linux-x86_64-2.6/psycopg/connection_type.o build/temp.linux-x86_64-2.6/psycopg/cursor_int.o build/temp.linux-x86_64-2.6/psycopg/cursor_type.o build/temp.linux-x86_64-2.6/psycopg/lobject_int.o build/temp.linux-x86_64-2.6/psycopg/lobject_type.o build/temp.linux-x86_64-2.6/psycopg/notify_type.o build/temp.linux-x86_64-2.6/psycopg/xid_type.o build/temp.linux-x86_64-2.6/psycopg/adapter_asis.o build/temp.linux-x86_64-2.6/psycopg/adapter_binary.o build/temp.linux-x86_64-2.6/psycopg/adapter_datetime.o build/temp.linux-x86_64-2.6/psycopg/adapter_list.o build/temp.linux-x86_64-2.6/psycopg/adapter_pboolean.o build/temp.linux-x86_64-2.6/psycopg/adapter_pdecimal.o build/temp.linux-x86_64-2.6/psycopg/adapter_pint.o build/temp.linux-x86_64-2.6/psycopg/adapter_pfloat.o build/temp.linux-x86_64-2.6/psycopg/adapter_qstring.o build/temp.linux-x86_64-2.6/psycopg/microprotocols.o build/temp.linux-x86_64-2.6/psycopg/microprotocols_proto.o build/temp.linux-x86_64-2.6/psycopg/typecast.o build/temp.linux-x86_64-2.6/psycopg/adapter_mxdatetime.o -lpq -o build/lib.linux-x86_64-2.6/psycopg2/_psycopg.so
    no previously-included directories found matching 'doc/src/_build'
  Running setup.py install for Django
    changing mode of build/scripts-2.6/django-admin.py from 644 to 755
    changing mode of /tmp/myenv/bin/django-admin.py to 755
  Running setup.py install for Markdown
    changing mode of build/scripts-2.6/markdown.py from 644 to 755
    changing mode of /tmp/myenv/bin/markdown.py to 755
  Running setup.py install for Pygments
    Installing pygmentize script to /tmp/myenv/bin
  Running setup.py install for Twisted
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.6 -c conftest.c -o conftest.o
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.6 -c conftest.c -o conftest.o
    building 'twisted.runner.portmap' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.6 -c twisted/runner/portmap.c -o build/temp.linux-x86_64-2.6/twisted/runner/portmap.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/twisted/runner/portmap.o -o build/lib.linux-x86_64-2.6/twisted/runner/portmap.so
    building 'twisted.protocols._c_urlarg' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.6 -c twisted/protocols/_c_urlarg.c -o build/temp.linux-x86_64-2.6/twisted/protocols/_c_urlarg.o
    twisted/protocols/_c_urlarg.c: In function ‘unquote’:
    twisted/protocols/_c_urlarg.c:41: warning: ‘tmp’ may be used uninitialized in this function
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/twisted/protocols/_c_urlarg.o -o build/lib.linux-x86_64-2.6/twisted/protocols/_c_urlarg.so
    building 'twisted.test.raiser' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.6 -c twisted/test/raiser.c -o build/temp.linux-x86_64-2.6/twisted/test/raiser.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/twisted/test/raiser.o -o build/lib.linux-x86_64-2.6/twisted/test/raiser.so
    building 'twisted.python._epoll' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.6 -c twisted/python/_epoll.c -o build/temp.linux-x86_64-2.6/twisted/python/_epoll.o
    twisted/python/_epoll.c: In function ‘__pyx_f_6_epoll_5epoll___dealloc__’:
    twisted/python/_epoll.c:168: warning: label ‘__pyx_L1’ defined but not used
    twisted/python/_epoll.c: In function ‘__pyx_f_6_epoll_5epoll_wait’:
    twisted/python/_epoll.c:432: warning: label ‘__pyx_L7’ defined but not used
    twisted/python/_epoll.c:430: warning: label ‘__pyx_L6’ defined but not used
    twisted/python/_epoll.c: In function ‘__pyx_tp_new_6_epoll_epoll’:
    twisted/python/_epoll.c:508: warning: unused variable ‘p’
    twisted/python/_epoll.c: In function ‘__pyx_tp_dealloc_6_epoll_epoll’:
    twisted/python/_epoll.c:513: warning: unused variable ‘p’
    twisted/python/_epoll.c: In function ‘__pyx_tp_traverse_6_epoll_epoll’:
    twisted/python/_epoll.c:528: warning: unused variable ‘p’
    twisted/python/_epoll.c:527: warning: unused variable ‘e’
    twisted/python/_epoll.c: In function ‘__pyx_tp_clear_6_epoll_epoll’:
    twisted/python/_epoll.c:533: warning: unused variable ‘p’
    twisted/python/_epoll.c: At top level:
    twisted/python/_epoll.c:32: warning: ‘__Pyx_UnpackItem’ declared ‘static’ but never defined
    twisted/python/_epoll.c:33: warning: ‘__Pyx_EndUnpack’ declared ‘static’ but never defined
    twisted/python/_epoll.c:34: warning: ‘__Pyx_PrintItem’ declared ‘static’ but never defined
    twisted/python/_epoll.c:35: warning: ‘__Pyx_PrintNewline’ declared ‘static’ but never defined
    twisted/python/_epoll.c:37: warning: ‘__Pyx_ReRaise’ declared ‘static’ but never defined
    twisted/python/_epoll.c:38: warning: ‘__Pyx_Import’ declared ‘static’ but never defined
    twisted/python/_epoll.c:39: warning: ‘__Pyx_GetExcValue’ declared ‘static’ but never defined
    twisted/python/_epoll.c:40: warning: ‘__Pyx_ArgTypeTest’ declared ‘static’ but never defined
    twisted/python/_epoll.c:41: warning: ‘__Pyx_TypeTest’ declared ‘static’ but never defined
    twisted/python/_epoll.c:42: warning: ‘__Pyx_GetStarArgs’ declared ‘static’ but never defined
    twisted/python/_epoll.c:43: warning: ‘__Pyx_WriteUnraisable’ declared ‘static’ but never defined
    twisted/python/_epoll.c:45: warning: ‘__Pyx_ImportType’ declared ‘static’ but never defined
    twisted/python/_epoll.c:46: warning: ‘__Pyx_SetVtable’ declared ‘static’ but never defined
    twisted/python/_epoll.c:47: warning: ‘__Pyx_GetVtable’ declared ‘static’ but never defined
    twisted/python/_epoll.c:48: warning: ‘__Pyx_CreateClass’ declared ‘static’ but never defined
    twisted/python/_epoll.c:50: warning: ‘__Pyx_InitStrings’ declared ‘static’ but never defined
    twisted/python/_epoll.c:51: warning: ‘__Pyx_InitCApi’ declared ‘static’ but never defined
    twisted/python/_epoll.c:52: warning: ‘__Pyx_ImportModuleCApi’ declared ‘static’ but never defined
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/twisted/python/_epoll.o -o build/lib.linux-x86_64-2.6/twisted/python/_epoll.so
    building 'twisted.python._initgroups' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.6 -c twisted/python/_initgroups.c -o build/temp.linux-x86_64-2.6/twisted/python/_initgroups.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/twisted/python/_initgroups.o -o build/lib.linux-x86_64-2.6/twisted/python/_initgroups.so
    changing mode of build/scripts-2.6/mktap from 644 to 755
    changing mode of build/scripts-2.6/tap2deb from 644 to 755
    changing mode of build/scripts-2.6/tap2rpm from 644 to 755
    changing mode of build/scripts-2.6/trial from 644 to 755
    changing mode of build/scripts-2.6/pyhtmlizer from 644 to 755
    changing mode of build/scripts-2.6/tapconvert from 644 to 755
    changing mode of build/scripts-2.6/manhole from 644 to 755
    changing mode of build/scripts-2.6/twistd from 644 to 755
    changing mode of build/scripts-2.6/ckeygen from 644 to 755
    changing mode of build/scripts-2.6/cftp from 644 to 755
    changing mode of build/scripts-2.6/conch from 644 to 755
    changing mode of build/scripts-2.6/tkconch from 644 to 755
    changing mode of build/scripts-2.6/mailmail from 644 to 755
    changing mode of build/scripts-2.6/lore from 644 to 755
    changing mode of /tmp/myenv/bin/mktap to 755
    changing mode of /tmp/myenv/bin/ckeygen to 755
    changing mode of /tmp/myenv/bin/cftp to 755
    changing mode of /tmp/myenv/bin/conch to 755
    changing mode of /tmp/myenv/bin/tap2deb to 755
    changing mode of /tmp/myenv/bin/tap2rpm to 755
    changing mode of /tmp/myenv/bin/lore to 755
    changing mode of /tmp/myenv/bin/trial to 755
    changing mode of /tmp/myenv/bin/pyhtmlizer to 755
    changing mode of /tmp/myenv/bin/tkconch to 755
    changing mode of /tmp/myenv/bin/tapconvert to 755
    changing mode of /tmp/myenv/bin/manhole to 755
    changing mode of /tmp/myenv/bin/twistd to 755
    changing mode of /tmp/myenv/bin/mailmail to 755
  Running setup.py develop for django-contact-form
    Creating /tmp/myenv/lib/python2.6/site-packages/django-contact-form.egg-link (link to .)
    Adding django-contact-form 0.3 to easy-install.pth file
    
    Installed /tmp/myenv/src/django-contact-form
  Running setup.py develop for akismet
    Creating /tmp/myenv/lib/python2.6/site-packages/akismet.egg-link (link to .)
    Adding akismet 0.1.5 to easy-install.pth file
    
    Installed /tmp/myenv/src/akismet
  Running setup.py install for Fabric
    warning: no previously-included files matching '*' found under directory 'docs/_build'
    warning: no files found matching 'fabfile.py'
    Installing fab script to /tmp/myenv/bin
  Running setup.py install for BeautifulSoup
  Running setup.py install for django-tagging
  Running setup.py install for zope.interface
    building 'zope.interface._zope_interface_coptimizations' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.6 -c src/zope/interface/_zope_interface_coptimizations.c -o build/temp.linux-x86_64-2.6/src/zope/interface/_zope_interface_coptimizations.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/zope/interface/_zope_interface_coptimizations.o -o build/lib.linux-x86_64-2.6/zope/interface/_zope_interface_coptimizations.so
    Skipping installation of /tmp/myenv/lib/python2.6/site-packages/zope/__init__.py (namespace package)
    Installing /tmp/myenv/lib/python2.6/site-packages/zope.interface-3.8.0-py2.6-nspkg.pth
  Running setup.py install for ssh
  Running setup.py install for pycrypto
    checking for gcc... gcc
    checking whether the C compiler works... yes
    checking for C compiler default output file name... a.out
    checking for suffix of executables...
    checking whether we are cross compiling... no
    checking for suffix of object files... o
    checking whether we are using the GNU C compiler... yes
    checking whether gcc accepts -g... yes
    checking for gcc option to accept ISO C89... none needed
    checking for __gmpz_init in -lgmp... no
    checking for __gmpz_init in -lmpir... no
    checking whether mpz_powm is declared... no
    checking whether mpz_powm_sec is declared... no
    checking how to run the C preprocessor... gcc -E
    checking for grep that handles long lines and -e... /bin/grep
    checking for egrep... /bin/grep -E
    checking for ANSI C header files... yes
    checking for sys/types.h... yes
    checking for sys/stat.h... yes
    checking for stdlib.h... yes
    checking for string.h... yes
    checking for memory.h... yes
    checking for strings.h... yes
    checking for inttypes.h... yes
    checking for stdint.h... yes
    checking for unistd.h... yes
    checking for inttypes.h... (cached) yes
    checking limits.h usability... yes
    checking limits.h presence... yes
    checking for limits.h... yes
    checking stddef.h usability... yes
    checking stddef.h presence... yes
    checking for stddef.h... yes
    checking for stdint.h... (cached) yes
    checking for stdlib.h... (cached) yes
    checking for string.h... (cached) yes
    checking wchar.h usability... yes
    checking wchar.h presence... yes
    checking for wchar.h... yes
    checking for inline... inline
    checking for int16_t... yes
    checking for int32_t... yes
    checking for int64_t... yes
    checking for int8_t... yes
    checking for size_t... yes
    checking for uint16_t... yes
    checking for uint32_t... yes
    checking for uint64_t... yes
    checking for uint8_t... yes
    checking for stdlib.h... (cached) yes
    checking for GNU libc compatible malloc... yes
    checking for memmove... yes
    checking for memset... yes
    configure: creating ./config.status
    config.status: creating src/config.h
    warning: GMP or MPIR library not found; Not building Crypto.PublicKey._fastmath.
    building 'Crypto.Hash._MD2' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/MD2.c -o build/temp.linux-x86_64-2.6/src/MD2.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/MD2.o -o build/lib.linux-x86_64-2.6/Crypto/Hash/_MD2.so
    building 'Crypto.Hash._MD4' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/MD4.c -o build/temp.linux-x86_64-2.6/src/MD4.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/MD4.o -o build/lib.linux-x86_64-2.6/Crypto/Hash/_MD4.so
    building 'Crypto.Hash._SHA256' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/SHA256.c -o build/temp.linux-x86_64-2.6/src/SHA256.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/SHA256.o -o build/lib.linux-x86_64-2.6/Crypto/Hash/_SHA256.so
    building 'Crypto.Hash._SHA224' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/SHA224.c -o build/temp.linux-x86_64-2.6/src/SHA224.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/SHA224.o -o build/lib.linux-x86_64-2.6/Crypto/Hash/_SHA224.so
    building 'Crypto.Hash._SHA384' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/SHA384.c -o build/temp.linux-x86_64-2.6/src/SHA384.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/SHA384.o -o build/lib.linux-x86_64-2.6/Crypto/Hash/_SHA384.so
    building 'Crypto.Hash._SHA512' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/SHA512.c -o build/temp.linux-x86_64-2.6/src/SHA512.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/SHA512.o -o build/lib.linux-x86_64-2.6/Crypto/Hash/_SHA512.so
    building 'Crypto.Hash._RIPEMD160' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -DPCT_LITTLE_ENDIAN=1 -Isrc/ -I/usr/include/python2.6 -c src/RIPEMD160.c -o build/temp.linux-x86_64-2.6/src/RIPEMD160.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/RIPEMD160.o -o build/lib.linux-x86_64-2.6/Crypto/Hash/_RIPEMD160.so
    building 'Crypto.Cipher.AES' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/AES.c -o build/temp.linux-x86_64-2.6/src/AES.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/AES.o -o build/lib.linux-x86_64-2.6/Crypto/Cipher/AES.so
    building 'Crypto.Cipher.ARC2' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/ARC2.c -o build/temp.linux-x86_64-2.6/src/ARC2.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/ARC2.o -o build/lib.linux-x86_64-2.6/Crypto/Cipher/ARC2.so
    building 'Crypto.Cipher.Blowfish' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/Blowfish.c -o build/temp.linux-x86_64-2.6/src/Blowfish.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/Blowfish.o -o build/lib.linux-x86_64-2.6/Crypto/Cipher/Blowfish.so
    building 'Crypto.Cipher.CAST' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/CAST.c -o build/temp.linux-x86_64-2.6/src/CAST.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/CAST.o -o build/lib.linux-x86_64-2.6/Crypto/Cipher/CAST.so
    building 'Crypto.Cipher.DES' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -Isrc/libtom/ -I/usr/include/python2.6 -c src/DES.c -o build/temp.linux-x86_64-2.6/src/DES.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/DES.o -o build/lib.linux-x86_64-2.6/Crypto/Cipher/DES.so
    building 'Crypto.Cipher.DES3' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -Isrc/libtom/ -I/usr/include/python2.6 -c src/DES3.c -o build/temp.linux-x86_64-2.6/src/DES3.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/DES3.o -o build/lib.linux-x86_64-2.6/Crypto/Cipher/DES3.so
    building 'Crypto.Cipher.ARC4' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/ARC4.c -o build/temp.linux-x86_64-2.6/src/ARC4.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/ARC4.o -o build/lib.linux-x86_64-2.6/Crypto/Cipher/ARC4.so
    building 'Crypto.Cipher.XOR' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/XOR.c -o build/temp.linux-x86_64-2.6/src/XOR.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/XOR.o -o build/lib.linux-x86_64-2.6/Crypto/Cipher/XOR.so
    building 'Crypto.Util.strxor' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/strxor.c -o build/temp.linux-x86_64-2.6/src/strxor.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/strxor.o -o build/lib.linux-x86_64-2.6/Crypto/Util/strxor.so
    building 'Crypto.Util._counter' extension
    gcc -pthread -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -fPIC -std=c99 -O3 -fomit-frame-pointer -Isrc/ -I/usr/include/python2.6 -c src/_counter.c -o build/temp.linux-x86_64-2.6/src/_counter.o
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/src/_counter.o -o build/lib.linux-x86_64-2.6/Crypto/Util/_counter.so
Successfully installed psycopg2 Django Markdown Pygments Twisted django-contact-form akismet Fabric BeautifulSoup django-tagging zope.interface ssh pycrypto
Cleaning up...

Use virtualenv with Django and mod_python

Here is how to use a virtualenv with mod_python. This is taken from Django, mod_python and virtualenv. For more information, see the virtualenv documentation on using virtualenv without bin/python.

Create a file /srv/SaltyCrane/myvirtualdjango.py:

activate_this = "/srv/python-environments/saltycrane/bin/activate_this.py"
execfile(activate_this, dict(__file__=activate_this))

from django.core.handlers.modpython import handler

Edit your httpd.conf

    <Location "/">
        SetHandler python-program
        PythonHandler myvirtualdjango
        SetEnv DJANGO_SETTINGS_MODULE iwiwdsmi.settings
        PythonPath "['/srv/SaltyCrane',] + sys.path"
        PythonDebug Off
    </Location>

Use virtualenv with Django and mod_wsgi

Added 2009-09-27: Here is how I set up my virtualenv with mod_wsgi. To use the the packages in my virtualenv, I used site.addsitedir at the top of my .wsgi application file. You may also want to set the WSGIPythonHome variable in your httpd.conf file (outside of any VirtualHost sections). For detailed information on using mod_wsgi with virtualenv, see the Virtual Environments section of the modwsgi project documentation.

/srv/SaltyCrane/saltycrane.wsgi:

import os
import sys
import site

site.addsitedir('/srv/python-environments/saltycrane/lib/python2.5/site-packages')

os.environ['DJANGO_SETTINGS_MODULE'] = 'iwiwdsmi.settings'

sys.path.append('/srv/SaltyCrane')

import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()

httpd.conf:

    WSGIScriptAlias / /srv/SaltyCrane/saltycrane.wsgi

Background / Discussion

Below are some links to some essential information on pip, virtualenv, and Python packaging (mostly from the creator of pip and virtualenv, Ian Bicking) and some further discussion on the state of Python packaging. (Updated 2012-11-17)

Workingenv is dead, long live Virtualenv! by Ian Bicking (2007-10-10)
mailing list post explaining a number of package management technologies by Kevin Teague (2008-09-14)
pyinstall: A New Hope by Ian Bicking (2008-09-24)
pyinstall pybundles by Ian Bicking (2008-10-01)
pyinstall is dead, long live pip! by Ian Bicking (2008-10-28)
On packaging by James Bennett (2008-12-14)
A Few Corrections To "On Packaging" by Ian Bicking (2008-12-14)
Why I like pip by James Bennett (2008-12-15)
Using pip Requirements by Ian Bicking (2008-12-16)
The Hitchhiker’s Guide to Packaging (Documentation for distribute) (Added 2010-04-13:)
Dear Django, help Python Packaging by Tarek Ziadé (2012-09-10)
Brett Cannon's summary of various PEPs regarding packaging (2012-09-11)
Dear Pyramid, help Python Packaging by Tarek Ziadé (2012-09-12)
The Chronology of Python Packaging Part 1, Part 2 by Tarek Ziadé (2012-11-17)

Converting time zones for datetime objects in Python

2009-05-05T20:25:20-07:00

Install pytz

I am using pytz, which is a time zone definitions package. You can install it using Easy Install. On Ubuntu, do this:

sudo easy_install --upgrade pytz

Add time zone information to a naive datetime object

from datetime import datetime
from pytz import timezone

date_str = "2009-05-05 22:28:15"
datetime_obj = datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S")
datetime_obj_utc = datetime_obj.replace(tzinfo=timezone('UTC'))
print datetime_obj_utc.strftime("%Y-%m-%d %H:%M:%S %Z%z")

Results:

2009-05-05 22:28:15 UTC+0000

Add non-UTC time zone information to a naive datetime object

(Added 2014-05-28)

NOTE: datetime.replace() does not handle daylight savings time correctly. The correct way is to use timezone.localize() instead. Using datetime.replace() is OK when working with UTC as shown above because it does not have daylight savings time transitions to deal with. See the pytz documentation.

from datetime import datetime
from pytz import timezone

date_str = "2014-05-28 22:28:15"
datetime_obj_naive = datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S")

# Wrong way!
datetime_obj_pacific = datetime_obj_naive.replace(tzinfo=timezone('US/Pacific'))
print datetime_obj_pacific.strftime("%Y-%m-%d %H:%M:%S %Z%z")

# Right way!
datetime_obj_pacific = timezone('US/Pacific').localize(datetime_obj_naive)
print datetime_obj_pacific.strftime("%Y-%m-%d %H:%M:%S %Z%z")

Results:

2014-05-28 22:28:15 PST-0800
2014-05-28 22:28:15 PDT-0700

Convert time zones

from datetime import datetime
from pytz import timezone

fmt = "%Y-%m-%d %H:%M:%S %Z%z"

# Current time in UTC
now_utc = datetime.now(timezone('UTC'))
print now_utc.strftime(fmt)

# Convert to US/Pacific time zone
now_pacific = now_utc.astimezone(timezone('US/Pacific'))
print now_pacific.strftime(fmt)

# Convert to Europe/Berlin time zone
now_berlin = now_pacific.astimezone(timezone('Europe/Berlin'))
print now_berlin.strftime(fmt)

Results:

2009-05-06 03:09:49 UTC+0000
2009-05-05 20:09:49 PDT-0700
2009-05-06 05:09:49 CEST+0200

List time zones

There are 559 time zones included in pytz. Here's how to print the US time zones:

from pytz import all_timezones

print len(all_timezones)
for zone in all_timezones:
    if 'US' in zone:
        print zone

Results:

US/Alaska
US/Aleutian
US/Arizona
US/Central
US/East-Indiana
US/Eastern
US/Hawaii
US/Indiana-Starke
US/Michigan
US/Mountain
US/Pacific
US/Pacific-New
US/Samoa

How to reverse words in a sentence using Python and C

2009-04-22T09:56:32-07:00

This is a technical problem I attempted recently. The problem was to reverse the words in a sentence. For example, The quick brown fox jumped over the lazy dog. becomes dog. lazy the over jumped fox brown quick The. I had to solve the problem first using Python, and then using C. In addition, the C version could only use 1 extra character of memory. I solved the Python version easily, but the C version was too difficult for me. Here are possible solutions.

Python version

sentence = "The quick brown fox jumped over the lazy dog."
words = sentence.split()
sentence_rev = " ".join(reversed(words))
print sentence_rev

C version

Credit for this solution goes to Hai Vu

#include <stdio.h>

/* function declarations */
void reverse_words(char *sentence);
void reverse_chars(char *left, char *right);

/* main program */
int main()
{
   char mysentence[] = "The quick brown fox jumped over the lazy dog.";

   reverse_words(mysentence);
   printf("%s\n", mysentence);

   return 0;
}

/* reverse the words in a sentence */
void reverse_words(char *sentence)
{
   char *start = sentence;
   char *end = sentence;

   /* find the end of the sentence */
   while (*end != '\0') {
      ++end;
   }
   --end;

   /* reverse the characters in the sentence */
   reverse_chars(start, end);
   
   /* reverse the characters in each word */
   while (*start != '\0') {
      /* move start pointer to the beginning of the next word */
      for (; *start != '\0' && *start == ' '; start++) ;

      /* move end pointer to the end of the next word */
      for (end=start; *end != '\0' && *end != ' '; end++) ;
      --end;

      /* reverse the characters in the word */
      reverse_chars(start, end);

      /* move to next word */
      start = ++end;
   }
}

/* reverse the characters in a string */
void reverse_chars(char *left, char *right)
{
   char temp;

   while( left < right) {
      temp = *left;
      *left = *right;
      *right = temp;
      ++left;
      --right;
   }
}

How to convert a PNM file to PDF with Python

2009-04-20T22:08:10-07:00

Install the Python Imaging Library (PIL)
On Ubuntu/Debian, use:
```
sudo apt-get install python-imaging
```

Create a file called convert_pnm_to_pdf.py:

import Image
import os
import sys

filename = sys.argv[1]
try:
    newfilename = os.path.splitext(filename)[0] + ".pdf"
    Image.open(filename).save(newfilename)
    print "Converted " + newfilename
except IOError:
    print "Cannot convert" + newfilename

Run the script:
```
python convert_pnm_to_pdf.py yourfile.pnm
```
A PDF file named yourfile.pdf will be created

The PIL also supports many other file formats including BMP, GIF, JPEG, PNG, and TIFF. For more information, see the Python Imaging Library Handbook

My Python geek list

2009-04-13T17:49:55-07:00

I noticed a few Python people had switched from Google Code to github or bitbucket. Then I thought, "Hey, maybe I can gather interesting information about my favorite Python geeks in a table!" Then I started making said table. Then I thought, "This is a dumb idea!" But I decided to post the table anyways. So, here it is: a dumb table of smart people.

Blog	Net	Code	Via	Other
Adam Gomaa		bitbucket		Django People
Bob Ippolito		Google Code	simplejson
David Beazly			Python Essential Reference, Course on Coroutines, PLY
Glyph Lefkowitz			Twisted
Guido van Rossum		Google Code	Python	Wikipedia
Ian Bicking		bitbucket	SQLObject, virtualenv, pip
Jack Diederich			Class Decorators PyCon talk
Jacob Kaplan-Moss		github	Django	Django People
James Bennett		bitbucket	Django	Django People
James Tauber		github	Pinax	Django People
Simon Willison		Google Code, github	Django	Wikipedia Django People

Scripting wmii column widths with Python

2009-04-12T23:05:12-07:00

I mentioned in my previous post on using wmii with Gnome that I had written a script for resizing the column widths in wmii. This is the followup post. Note, I am using the 20080520 snapshot of wmii. This doesn't work with wmii 3.6 (as Marco commented below).

To incrementally change window sizes, I use the following in my ~/.wmii-3.5/wmiirc file:

	Key $MODKEY-y
		# shrink horizontally
		wmiir xwrite /tag/sel/ctl grow sel sel right -10
		wmiir xwrite /tag/sel/ctl grow sel sel left -10
	Key $MODKEY-u
		# grow horizontally
		wmiir xwrite /tag/sel/ctl grow sel sel right 10
		wmiir xwrite /tag/sel/ctl grow sel sel left 10
	Key $MODKEY-i
		# shrink vertically
		wmiir xwrite /tag/sel/ctl grow sel sel down -10
		wmiir xwrite /tag/sel/ctl grow sel sel up -10
	Key $MODKEY-o
		# grow vertically
		wmiir xwrite /tag/sel/ctl grow sel sel down 10
		wmiir xwrite /tag/sel/ctl grow sel sel up 10
	Key $MODKEY-Shift-y
		# shrink horizontally
		wmiir xwrite /tag/sel/ctl grow sel sel right -2
		wmiir xwrite /tag/sel/ctl grow sel sel left -2
	Key $MODKEY-Shift-u
		# grow horizontally
		wmiir xwrite /tag/sel/ctl grow sel sel right 2
		wmiir xwrite /tag/sel/ctl grow sel sel left 2
	Key $MODKEY-Shift-i
		# shrink vertically
		wmiir xwrite /tag/sel/ctl grow sel sel down -2
		wmiir xwrite /tag/sel/ctl grow sel sel up -2
	Key $MODKEY-Shift-o
		# grow vertically
		wmiir xwrite /tag/sel/ctl grow sel sel down 2
		wmiir xwrite /tag/sel/ctl grow sel sel up 2

In addition to incrementally changing column widths, I wanted to be able to switch to predetermined column width ratios with a keyboard shortcut. For example, I wanted to be able to set the column widths at a 20/80 ratio, a 40/60 ratio, a 50/50 ratio, a 60/40 ratio, and so on. So I hacked a Python script to do this. It is pretty ugly because I first grow the window by a set amount, measure the change in size, then grow it again to the correct width. If anyone knows of a better way to do this, please let me know. I'm posting my solution here in case anyone else wanted to do the same thing and got stuck. ~~(Note, this script only works with two columns)~~

UPDATE 2009-12-21: I just learned from the new wmii documentation that I can specify a grow amount in pixels by suffixing it with "px". This means I no longer have to perform the ugly, extra grow-then-measure step in my script. I'm not sure if this is a newly added change or if it is just newly documented. I am now using wmii 3.9b1. I have updated the script below to use the new method. Also, the script now works with more than two columns. I kept the old method for reference.

#!/usr/bin/env python

import os
import re
import sys

class Wmii:
    """
    wmiir xwrite /tag/sel/ctl grow col row side increment
    col: column number of the window to grow
    row: row number of the window to grow
    side: the side to grow. one of left, right, up, or down
    increment: the number of pixels to grow. use a positive number to grow larger
    and a negative number to grow smaller
    """
    def set_column_widths(self, width_list):
        """Use the 'grow' command to set the column widths to those specified.
        Widths are specified in percentages.
        """
        total_width_perc = sum([float(width) for width in width_list])
        for i, width_perc in enumerate(width_list[:-1]):
            self.read_current_col_widths()
            total_width_px = float(sum(self.curr_colwidths))
            new_width_px =  float(width_perc) / total_width_perc * total_width_px
            grow_amount_px = int(round(new_width_px - self.curr_colwidths[i]))
            self.xwrite("/tag/sel/ctl grow %d 1 right %dpx" % (i+1, grow_amount_px))

    def read_current_col_widths(self):
        """'wmiir read /tag/sel/index' and set the attribute, self.curr_colwidths.
        self.curr_colwidths is a list of the width (ints) (in pixels) of each
        column in the view.
        """
        lines = self.read("/tag/sel/index")
        self.curr_colwidths = []
        for line in lines:
            match = re.search(r"# [^~]+ \d+ (\d+)", line)
            if match:
                self.curr_colwidths.append(int(match.group(1)))
        print self.curr_colwidths

    def xwrite(self, path_and_value):
        """Use the xwrite form."""
        cmd = "wmiir xwrite %s" % path_and_value
        print cmd
        os.system(cmd)

    def read(self, path):
        """Return a list of the lines returned by "wmii read path" """
        return os.popen4("wmiir read " + path)[1].readlines()

if __name__ == "__main__":
    w = Wmii()
    w.set_column_widths(sys.argv[1:])

Old method (for reference):

#!/usr/bin/env python

import os
import re
import sys

class Wmii:
    """
    wmiir xwrite /tag/sel/ctl grow col row side increment
    col: column number of the window to grow
    row: row number of the window to grow
    side: the side to grow. one of left, right, up, or down
    increment: the number of pixels to grow. use a positive number to grow larger
    and a negative number to grow smaller
    """
    def __init__(self):
        pass

    def set_column_widths(self, width0, width1):
        """Use the 'grow' command to set the column widths to those specified.
        Widths are specified in percentages.
        Currently only works with 2 columns.
        """
        self.determine_pixels_per_grow_horiz()
        new_width0 = sum(self.curr_colwidths) * (float(width0) /
                                                 (float(width0)+float(width1)))
        grow_amount = int(round((new_width0-self.curr_colwidths[0]) /
                                self.pixels_per_grow_increment))
        self.xwrite("/tag/sel/ctl grow 1 1 right %d" % grow_amount)

    def determine_pixels_per_grow_horiz(self):
        """Try growing by an increment of 1 and record the number of pixels changed.
        """
        self.read_current_col_widths()
        prev_colwidth0 = self.curr_colwidths[0]
        self.xwrite("/tag/sel/ctl grow 1 1 right 1")
        self.read_current_col_widths()
        self.pixels_per_grow_increment = self.curr_colwidths[0] - prev_colwidth0

    def read_current_col_widths(self):
        """'wmiir read /tag/sel/index' and set the attribute, self.curr_colwidths.
        self.curr_colwidths is a list of the width (ints) (in pixels) of each
        column in the view.
        """
        lines = self.read("/tag/sel/index")
        self.curr_colwidths = []
        for line in lines:
            match = re.search(r"# [^~]+ \d+ (\d+)", line)
            if match:
                self.curr_colwidths.append(int(match.group(1)))
        print self.curr_colwidths

    def read_current_column_number(self):
        """'wmiir read /tag/sel/ctl' and set the attribute, self.curr_col."""
        lines = self.read("/tag/sel/ctl")
        self.curr_col = re.split(" ", lines[1])[1]
        print "curr_col = %s" % self.curr_col

    def xwrite(self, path_and_value):
        """Use the xwrite form."""
        cmd = "wmiir xwrite %s" % path_and_value
        print cmd
        os.system(cmd)

    def read(self, path):
        """Return a list of the lines returned by "wmii read path" """
        return os.popen4("wmiir read " + path)[1].readlines()

if __name__ == "__main__":
    w = Wmii()
    w.set_column_widths(sys.argv[1], sys.argv[2])

I named the script wmii.py, made it executable, and put it on my path. Then I modified my wmiirc:

	Key $MODKEY-q
		wmii.py 20 80
	Key $MODKEY-w
		wmii.py 40 60
	Key $MODKEY-e
		wmii.py 50 50
	Key $MODKEY-r
		wmii.py 60 40
	Key $MODKEY-t
		wmii.py 80 20

Hope that's helpful to someone. Let me know if you've written some cool python wmii scripts.

Python datetime / time conversions

2008-11-12T16:45:42-08:00

from datetime import datetime
import time

#-------------------------------------------------
# conversions to strings
#-------------------------------------------------
# datetime object to string
dt_obj = datetime(2008, 11, 10, 17, 53, 59)
date_str = dt_obj.strftime("%Y-%m-%d %H:%M:%S")
print date_str

# time tuple to string
time_tuple = (2008, 11, 12, 13, 51, 18, 2, 317, 0)
date_str = time.strftime("%Y-%m-%d %H:%M:%S", time_tuple)
print date_str

#-------------------------------------------------
# conversions to datetime objects
#-------------------------------------------------
# time tuple to datetime object
time_tuple = (2008, 11, 12, 13, 51, 18, 2, 317, 0)
dt_obj = datetime(*time_tuple[0:6])
print repr(dt_obj)

# date string to datetime object
date_str = "2008-11-10 17:53:59"
dt_obj = datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S")
print repr(dt_obj)

# timestamp to datetime object in local time
timestamp = 1226527167.595983
dt_obj = datetime.fromtimestamp(timestamp)
print repr(dt_obj)

# timestamp to datetime object in UTC
timestamp = 1226527167.595983
dt_obj = datetime.utcfromtimestamp(timestamp)
print repr(dt_obj)

#-------------------------------------------------
# conversions to time tuples
#-------------------------------------------------
# datetime object to time tuple
dt_obj = datetime(2008, 11, 10, 17, 53, 59)
time_tuple = dt_obj.timetuple()
print repr(time_tuple)

# string to time tuple
date_str = "2008-11-10 17:53:59"
time_tuple = time.strptime(date_str, "%Y-%m-%d %H:%M:%S")
print repr(time_tuple)

# timestamp to time tuple in UTC
timestamp = 1226527167.595983
time_tuple = time.gmtime(timestamp)
print repr(time_tuple)

# timestamp to time tuple in local time
timestamp = 1226527167.595983
time_tuple = time.localtime(timestamp)
print repr(time_tuple)

#-------------------------------------------------
# conversions to timestamps
#-------------------------------------------------
# time tuple in local time to timestamp
time_tuple = (2008, 11, 12, 13, 59, 27, 2, 317, 0)
timestamp = time.mktime(time_tuple)
print repr(timestamp)

# time tuple in utc time to timestamp
time_tuple_utc = (2008, 11, 12, 13, 59, 27, 2, 317, 0)
timestamp_utc = calendar.timegm(time_tuple_utc)
print repr(timestamp_utc)

#-------------------------------------------------
# results
#-------------------------------------------------
# 2008-11-10 17:53:59
# 2008-11-12 13:51:18
# datetime.datetime(2008, 11, 12, 13, 51, 18)
# datetime.datetime(2008, 11, 10, 17, 53, 59)
# datetime.datetime(2008, 11, 12, 13, 59, 27, 595983)
# datetime.datetime(2008, 11, 12, 21, 59, 27, 595983)
# (2008, 11, 10, 17, 53, 59, 0, 315, -1)
# (2008, 11, 10, 17, 53, 59, 0, 315, -1)
# (2008, 11, 12, 21, 59, 27, 2, 317, 0)
# (2008, 11, 12, 13, 59, 27, 2, 317, 0)
# 1226527167.0
# 1226498367

Python UnicodeEncodeError: 'ascii' codec can't encode character

2008-11-06T00:07:47-08:00

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' 
in position 0: ordinal not in range(128)

If you've ever gotten this error, Django's smart_str function might be able to help. I found this from James Bennett's article, Unicode in the real world. He provides a very good explanation of Python's Unicode and bytestrings, their use in Django, and using Django's Unicode utilities for working with non-Unicode-friendly Python libraries. Here are my notes from his article as it applies to the above error. Much of the wording is directly from James Bennett's article.

This error occurs when you pass a Unicode string containing non-English characters (Unicode characters beyond 128) to something that expects an ASCII bytestring. The default encoding for a Python bytestring is ASCII, "which handles exactly 128 (English) characters". This is why trying to convert Unicode characters beyond 128 produces the error.

The good news is that you can encode Python bytestrings in other encodings besides ASCII. Django's smart_str function in the django.utils.encoding module, converts a Unicode string to a bytestring using a default encoding of UTF-8.

Here is an example using the built-in function, str:

a = u'\xa1'
print str(a) # this throws an exception

Results:

Traceback (most recent call last):
  File "unicode_ex.py", line 3, in 
    print str(a) # this throws an exception
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)

Here is an example using smart_str:

from django.utils.encoding import smart_str, smart_unicode

a = u'\xa1'
print smart_str(a)

Results:

¡

Definitions

Unicode string: sequence of Unicode characters
Python bytestring: a series of bytes which represent a sequence of characters. It's default encoding is ASCII. This is the "normal", non-Unicode string in Python <3.0.
encoding: a code that pairs a sequence of characters with a series of bytes
ASCII: an encoding which handles 128 English characters
UTF-8: a popular encoding used for Unicode strings which is backwards compatible with ASCII for the first 128 characters. It uses one to four bytes for each character.

Operations related to str and unicode objects

unicode.encode() - converts to str
str.decode() - converts to unicode
unicode(str, encoding) - converts to unicode
ord(c) - returns the Unicode code point of the character
chr(i) - returns a str object for the given ASCII code (inverse of ord() for 8-bit strings)
unichr(i) - returns a unicode object for the given Unicode code (inverse of ord() for Unicode strings)

Table of operations on str types (Python 2.7)

| x ->                      | 'i'              | '¡'              | '\xa1'       | chr(105)         | u'\xa1'.encode('utf-8') |
|---------------------------+------------------+------------------+--------------+------------------+-------------------------+
| type(x)                   | <type 'str'>     | <type 'str'>     | <type 'str'> | <type 'str'>     | <type 'str'>            |
| ord(x)                    | 105              | NA               | 161          | 105              | NA                      |
| type(str(x))              | <type 'str'>     | <type 'str'>     | <type 'str'> | <type 'str'>     | <type 'str'>            |
| type(unicode(x))          | <type 'unicode'> | DecodeError      | DecodeError  | <type 'unicode'> | DecodeError             |
| type(unicode(x, 'utf-8')) | <type 'unicode'> | <type 'unicode'> | DecodeError  | <type 'unicode'> | <type 'unicode'>        |
| type(unicode(x, 'ascii')) | <type 'unicode'> | DecodeError      | DecodeError  | <type 'unicode'> | DecodeError             |
| type(x.decode('utf-8'))   | <type 'unicode'> | <type 'unicode'> | DecodeError  | <type 'unicode'> | <type 'unicode'>        |
| type(x.encode('utf-8'))   | <type 'str'>     | DecodeError      | DecodeError  | <type 'str'>     | DecodeError             |
| type(x.decode('ascii'))   | <type 'unicode'> | DecodeError      | DecodeError  | <type 'unicode'> | DecodeError             |
| type(x.encode('ascii'))   | <type 'str'>     | DecodeError      | DecodeError  | <type 'str'>     | DecodeError             |

Table of operations on unicode types (Python 2.7)

| x ->                      | u'i'             | u'¡'             | u'\xa1'          | unichr(161)      | '¡'.decode('utf-8') |
|---------------------------+------------------+------------------+------------------+------------------+---------------------|
| type(x)                   | <type 'unicode'> | <type 'unicode'> | <type 'unicode'> | <type 'unicode'> | <type 'unicode'>    |
| ord(x)                    | 105              | 161              | 161              | 161              | 161                 |
| type(str(x))              | <type 'str'>     | EncodeError      | EncodeError      | EncodeError      | EncodeError         |
| type(unicode(x))          | <type 'unicode'> | <type 'unicode'> | <type 'unicode'> | <type 'unicode'> | <type 'unicode'>    |
| type(unicode(x, 'utf-8')) | <not supported>  | <not supported>  | <not supported>  | <not supported>  | <not supported>     |
| type(unicode(x, 'ascii')) | <not supported>  | <not supported>  | <not supported>  | <not supported>  | <not supported>     |
| type(x.decode('utf-8'))   | <type 'unicode'> | EncodeError      | EncodeError      | EncodeError      | EncodeError         |
| type(x.encode('utf-8'))   | <type 'str'>     | <type 'str'>     | <type 'str'>     | <type 'str'>     | <type 'str'>        |
| type(x.decode('ascii'))   | <type 'unicode'> | EncodeError      | EncodeError      | EncodeError      | EncodeError         |
| type(x.encode('ascii'))   | <type 'str'>     | EncodeError      | EncodeError      | EncodeError      | EncodeError         |

Unicode unit tests (Python 2.7)

import io
import os.path
import shutil
import tempfile
import unittest


class UnicodeTestCase(unittest.TestCase):
    codepoint105_as_unicode = unichr(105)
    codepoint105_as_bytestring = 'i'
    codepoint105_as_bytestring_ascii = unichr(105).encode('ascii')
    codepoint105_as_bytestring_utf8 = unichr(105).encode('utf-8')
    codepoint105_as_bytestring_latin1 = unichr(105).encode('latin-1')
    codepoint105_as_bytestring_cp950 = unichr(105).encode('cp950')

    codepoint161_as_unicode = unichr(161)
    codepoint161_as_bytestring_utf8 = unichr(161).encode('utf-8')
    codepoint161_as_bytestring_latin1 = unichr(161).encode('latin-1')

    def setUp(self):
        self.tempdir = tempfile.mkdtemp(prefix='tmp-ditest-')
        self.codepoint105_ascii_filepath = os.path.join(self.tempdir, 'codepoint105_ascii')
        self.codepoint105_utf8_filepath = os.path.join(self.tempdir, 'codepoint105_utf8')
        self.codepoint105_latin1_filepath = os.path.join(self.tempdir, 'codepoint105_latin1')
        self.codepoint161_ascii_filepath = 'codepoint 161 cannot be encoded using ascii'
        self.codepoint161_utf8_filepath = os.path.join(self.tempdir, 'codepoint161_utf8')
        self.codepoint161_latin1_filepath = os.path.join(self.tempdir, 'codepoint161_latin1')

        with io.open(self.codepoint105_ascii_filepath, 'w', encoding='ascii') as f:
            f.write(self.codepoint105_as_unicode)
        with io.open(self.codepoint105_utf8_filepath, 'w', encoding='utf8') as f:
            f.write(self.codepoint105_as_unicode)
        with io.open(self.codepoint105_latin1_filepath, 'w', encoding='latin1') as f:
            f.write(self.codepoint105_as_unicode)

        with io.open(self.codepoint161_utf8_filepath, 'w', encoding='utf8') as f:
            f.write(self.codepoint161_as_unicode)
        with io.open(self.codepoint161_latin1_filepath, 'w', encoding='latin1') as f:
            f.write(self.codepoint161_as_unicode)

    def tearDown(self):
        shutil.rmtree(self.tempdir)

    def test_encoding_decoding_latin1_utf8(self):
        self.assertEqual(u'\xa1', unichr(161))
        self.assertEqual(
            unichr(161).encode('utf-8').decode('utf-8'),
            unichr(161))
        self.assertEqual(
            unichr(161).encode('latin-1').decode('latin-1'),
            unichr(161))
        self.assertNotEqual(
            unichr(161).encode('utf-8').decode('latin-1'),
            unichr(161))
        with self.assertRaises(UnicodeDecodeError):
            unichr(161).encode('latin-1').decode('utf-8'),

    def test_bif_open_read(self):
        with open(self.codepoint161_utf8_filepath) as f:
            text = f.read()
            self.assertEqual(text, self.codepoint161_as_bytestring_utf8)
            self.assertEqual(type(text), type(self.codepoint161_as_bytestring_utf8))
        with open(self.codepoint161_latin1_filepath) as f:
            text = f.read()
            self.assertEqual(text, self.codepoint161_as_bytestring_latin1)
            self.assertEqual(type(text), type(self.codepoint161_as_bytestring_latin1))

    def test_io_open_utf8_read(self):
        with io.open(self.codepoint161_utf8_filepath, encoding='utf-8') as f:
            text = f.read()
            self.assertEqual(text, self.codepoint161_as_unicode)
            self.assertEqual(type(text), type(self.codepoint161_as_unicode))
        with io.open(self.codepoint161_latin1_filepath, encoding='utf-8') as f:
            with self.assertRaises(UnicodeDecodeError):
                f.read()

    def test_io_open_latin1_read(self):
        with io.open(self.codepoint161_utf8_filepath, encoding='latin-1') as f:
            text = f.read()
            self.assertNotEqual(text, self.codepoint161_as_unicode)
        with io.open(self.codepoint161_latin1_filepath, encoding='latin-1') as f:
            text = f.read()
            self.assertEqual(text, self.codepoint161_as_unicode)
            self.assertEqual(type(text), type(self.codepoint161_as_unicode))

    def test_bif_open_write(self):
        with open('test.txt', 'w') as f:
            f.write(self.codepoint105_as_bytestring)
            f.write(self.codepoint105_as_unicode)
            f.write(self.codepoint161_as_bytestring_utf8)
            f.write(self.codepoint161_as_bytestring_latin1)
            with self.assertRaises(UnicodeEncodeError):
                f.write(self.codepoint161_as_unicode)

    def test_io_open_write(self):
        with io.open('test.txt', 'w') as f:
            f.write(self.codepoint105_as_unicode)
            f.write(self.codepoint161_as_unicode)

            with self.assertRaises(TypeError):
                f.write(self.codepoint105_as_bytestring)
            with self.assertRaises(TypeError):
                f.write(self.codepoint161_as_bytestring_utf8)
            with self.assertRaises(TypeError):
                f.write(self.codepoint161_as_bytestring_latin1)

    def test_io_open_utf8_write(self):
        with io.open('test.txt', 'w', encoding='utf-8') as f:
            f.write(self.codepoint105_as_unicode)
            f.write(self.codepoint161_as_unicode)

            with self.assertRaises(TypeError):
                f.write(self.codepoint105_as_bytestring)
            with self.assertRaises(TypeError):
                f.write(self.codepoint161_as_bytestring_utf8)
            with self.assertRaises(TypeError):
                f.write(self.codepoint161_as_bytestring_latin1)

    def test_io_open_latin1_write(self):
        with io.open('test.txt', 'w', encoding='latin-1') as f:
            f.write(self.codepoint105_as_unicode)
            f.write(self.codepoint161_as_unicode)

            with self.assertRaises(TypeError):
                f.write(self.codepoint105_as_bytestring)
            with self.assertRaises(TypeError):
                f.write(self.codepoint161_as_bytestring_utf8)
            with self.assertRaises(TypeError):
                f.write(self.codepoint161_as_bytestring_latin1)

References / See Also

How to escape (percent-encode) a URL with Python

2008-10-29T16:44:01-07:00

import urllib

print urllib.quote_plus("http://www.yahoo.com/")
print urllib.quote_plus("Kruder & Dorfmeister")

Results:

http%3A%2F%2Fwww.yahoo.com%2F
Kruder+%26+Dorfmeister

It is easy to be drawn to the urlencode function in the Python urllib module documentation. But for simple escaping, only quote_plus, or possibly quote is needed. I believe this is the appropriate solution to Python urlencode annoyance and O'Reilly's Amazon Hack #92.

For reference: Percent-encoding on Wikipedia

Running a Twisted Perspective Broker example with twistd

2008-10-27T00:26:00-07:00

I've been using Twisted's Perspective Broker to manage networking for my Python program. Perspective Broker allows me to run a Python program on a remote machine and perform remote method calls on an object in the Python program. It also allows me to serialize objects and transfer them over TCP.

Once I got a Perspective Broker server and client running, I wanted to create a "Twisted Application" and run it using twistd, the Twisted Daemon. Two major options are: creating a .tac file and creating a twistd plugin. Below, I show the steps I took to create a .tac application script and a twistd plugin using a simple Perspective Broker example.

Basic Perspective Broker server and client

Here is a simple Perspective Broker server and client example which I adapted from Twisted's examples page. The PB server code lives in pbsimpleserver.py:

from twisted.spread import pb
from twisted.internet import reactor

class Echoer(pb.Root):
    def remote_echo(self, st):
        print 'echoing:', st
        return st

if __name__ == '__main__':
    serverfactory = pb.PBServerFactory(Echoer())
    reactor.listenTCP(8789, serverfactory)
    reactor.run()

Here is a simple PB client, pbsimpleclient.py:

from twisted.spread import pb
from twisted.internet import reactor

class EchoClient(object):
    def connect(self):
        clientfactory = pb.PBClientFactory()
        reactor.connectTCP("localhost", 8789, clientfactory)
        d = clientfactory.getRootObject()
        d.addCallback(self.send_msg)
        
    def send_msg(self, result):
        d = result.callRemote("echo", "hello network")
        d.addCallback(self.get_msg)

    def get_msg(self, result):
        print "server echoed: ", result
        
if __name__ == '__main__':
    EchoClient().connect()
    reactor.run()

This code connects the client to the server using port 8789. It sends a message by calling the server's remote_echo method with an argument of "hello network". The server prints the message and returns the same message to the client. The client then prints the message.

To test the code, I ran both client and server on my local machine. I ran python pbsimpleserver.py in one terminal shell, then ran python pbsimpleclient.py in another shell. In the server shell, I got:

echoing: hello network

and in the client shell I got:

server echoed:  hello network

To stop the client and server, I hit CTRL-C in both shells

Converting to a Twisted Application (.tac script)

To create a Twisted application, I used the twisted.application.service.Application object. When converting the example code above to an Application, I replaced twisted.internet.reactor.listenTCP with twisted.application.internet.TCPServer and twisted.internet.reactor.connectTCP with twisted.application.internet.TCPClient. Below is my server application, pbsimpleserver_app.py. (Note: the two files below are considered .tac files even though the filename doesn't end in .tac. From the documentation, .tac files are Python files which configure an Application object and assign this object to the top-level variable "application".)

from twisted.spread import pb
from twisted.application.internet import TCPServer
from twisted.application.service import Application

class Echoer(pb.Root):
    def remote_echo(self, st):
        print 'echoing:', st
        return st

serverfactory = pb.PBServerFactory(Echoer())
application = Application("echo")
echoServerService = TCPServer(8789, serverfactory)
echoServerService.setServiceParent(application)

Here is my client application, pbsimpleclient_app.py:

from twisted.spread import pb
from twisted.application.internet import TCPClient
from twisted.application.service import Application

class EchoClient(object):
    def send_msg(self, result):
        d = result.callRemote("echo", "hello network")
        d.addCallback(self.get_msg)

    def get_msg(self, result):
        print "server echoed: ", result

e = EchoClient()
clientfactory = pb.PBClientFactory()
d = clientfactory.getRootObject()
d.addCallback(e.send_msg)        

application = Application("echo")
echoClientService = TCPClient("localhost", 8789, clientfactory)
echoClientService.setServiceParent(application)

To run these as daemons with twistd, I executed:

twistd -l server.log --pidfile server.pid -y pbsimpleserver_app.py

then:

twistd -l client.log --pidfile client.pid -y pbsimpleclient_app.py

This created the log file, server.log:

2008-10-27 00:08:35-0700 [-] Log opened.
2008-10-27 00:08:35-0700 [-] twistd 8.1.0 (/usr/bin/python 2.5.2) starting up
2008-10-27 00:08:35-0700 [-] reactor class: 
2008-10-27 00:08:35-0700 [-] twisted.spread.pb.PBServerFactory starting on 8789
2008-10-27 00:08:35-0700 [-] Starting factory 
2008-10-27 00:08:53-0700 [Broker,0,127.0.0.1] echoing: hello network

and client.log:

2008-10-27 00:08:53-0700 [-] Log opened.
2008-10-27 00:08:53-0700 [-] twistd 8.1.0 (/usr/bin/python 2.5.2) starting up
2008-10-27 00:08:53-0700 [-] reactor class: 
2008-10-27 00:08:53-0700 [-] Starting factory 
2008-10-27 00:08:53-0700 [Broker,client] server echoed:  hello network

Creating a twistd plugin

In order to pass command line arguments to my Twisted application daemon, I need to create a twistd plugin. The following is how I implemented my plugin after reading the Writing a twistd plugin documentation. Here is my directory structure. EchoProj is located in ~/Projects.

EchoProj
|-- echoproj
|   |-- __init__.py
|   |-- pbsimpleclient.py
|   `-- pbsimpleserver.py
`-- twisted
    `-- plugins
        |-- echoclient_plugin.py
        `-- echoserver_plugin.py

pbsimpleserver.py:

from twisted.spread import pb

class EchoServer(pb.Root):
    def remote_echo(self, st):
        print 'echoing:', st
        return st

pbsimpleclient.py:

class EchoClient(object):
    def send_msg(self, result):
        d = result.callRemote("echo", "hello network")
        d.addCallback(self.get_msg)

    def get_msg(self, result):
        print "server echoed: ", result

echoserver_plugin.py:

from zope.interface import implements
from twisted.python import usage
from twisted.plugin import IPlugin
from twisted.application.service import IServiceMaker
from twisted.application.internet import TCPServer
from twisted.spread import pb
from echoproj.pbsimpleserver import EchoServer

class Options(usage.Options):
    optParameters = [["port", "p", 8789, "The port number to listen on."]]

class MyServiceMaker(object):
    implements(IServiceMaker, IPlugin)
    tapname = "echoserver"
    description = "Echo Server"
    options = Options

    def makeService(self, options):
        serverfactory = pb.PBServerFactory(EchoServer())
        return TCPServer(int(options["port"]), serverfactory)

serviceMaker = MyServiceMaker()

echoclient_plugin.py:

from zope.interface import implements
from twisted.python import usage
from twisted.plugin import IPlugin
from twisted.application.service import IServiceMaker
from twisted.application.internet import TCPClient
from twisted.spread import pb
from echoproj.pbsimpleclient import EchoClient

class Options(usage.Options):
    optParameters = [["port", "p", 8789, "The port number to connect to."],
                     ["host", "h", "localhost", "The host machine to connect to."]
                     ]

class MyServiceMaker(object):
    implements(IServiceMaker, IPlugin)
    tapname = "echoclient"
    description = "Echo Client"
    options = Options

    def makeService(self, options):
        e = EchoClient()
        clientfactory = pb.PBClientFactory()
        d = clientfactory.getRootObject()
        d.addCallback(e.send_msg)
        return TCPClient(options["host"], int(options["port"]), clientfactory)

serviceMaker = MyServiceMaker()

I set the PYTHONPATH to include the top-level project directory:

export PYTHONPATH="$HOME/Projects/EchoProj:$PYTHONPATH"

Running twistd --help now showed "echoserver" and "echoclient" in the list of commands. To run my server and client as daemons using port 8790 on my local machine, I executed:

twistd -l server.log --pidfile server.pid echoserver -p 8790

and

twistd -l client.log --pidfile client.pid echoclient -p 8790

This produced the logfiles, server.log:

2008-10-27 11:49:12-0700 [-] Log opened.
2008-10-27 11:49:12-0700 [-] twistd 8.1.0 (/usr/bin/python 2.5.2) starting up
2008-10-27 11:49:12-0700 [-] reactor class: 
2008-10-27 11:49:12-0700 [-] twisted.spread.pb.PBServerFactory starting on 8790
2008-10-27 11:49:12-0700 [-] Starting factory 
2008-10-27 11:49:17-0700 [Broker,0,127.0.0.1] echoing: hello network

and client.log:

2008-10-27 11:49:17-0700 [-] Log opened.
2008-10-27 11:49:17-0700 [-] twistd 8.1.0 (/usr/bin/python 2.5.2) starting up
2008-10-27 11:49:17-0700 [-] reactor class: 
2008-10-27 11:49:17-0700 [-] Starting factory 
2008-10-27 11:49:17-0700 [Broker,client] server echoed:  hello network

Twisted links

2008-10-21T17:01:20-07:00

Twisted tutorials

Twisted - hello, asynchronous programming
by Jesse Noller, February 11, 2009
Overview of Twisted concepts
Twisted inlineCallbacks and deferredGenerator
by Marcin Kasperski, August 13, 2008
Examples comparing raw deferreds, deferredGenerator, and inlineCallbacks.
Async Batching with Twisted: A Walkthrough
by Duncan McGreggor, June 20, 2008
This article features 8 easy to understand examples demonstrating Deferreds, callbacks, DeferredLists, DeferredSemaphores, and task.Cooperator.
Concurrency with Python, Twisted, and Flex
by Bruce Eckel, May 3, 2008
Grokking Twisted
by Bruce Eckel, April 15, 2006
Event-Driven Programming with Twisted and Python
Linux Journal, January 26, 2005
The Twisted Matrix Framework: Part One, Understanding Asynchronous Networking
by David Mertz, May 2003 (See also Part 2, Part3, and Part 4)

Twisted philosophy

Constructive Criticism
by Glyph Lefkowitz, July 2, 2008
So You Want Your Code to Be Asynchronous? A Twisted Interview
by Duncan McGreggor, June 27, 2008
Knowing Santa Claus is Fake Doesn't Ruin Christmas
by Glyph Lefkowitz, August 19, 2005
Thinking in Twisted
by Duncan McGreggor, June 13, 2005

Can't block for a Deferred in Twisted

2008-10-20T14:51:29-07:00

Despite the existence of the promising waitForDeferred/deferredGenerator and the newer inlineCallbacks, it appears there is no way to block while waiting for a Deferred. Brian Granger described the problem on the Twisted mailing list:

I have a function that returns a Deferred. I need to have the result of this Deferred returned in a (apparently) blocking/synchronous manner:
def myfuncBlocking():
  d = myfuncReturnsDeferred()
  ...
  result =

  return result
I need to be able to call this function like:
result = myfuncBlocking()
The question is how to get the result out of the Deferred() and make it *look* like myfuncBlocking() has blocked.

glyph provided the succinct answer (as well as an interesting commentary on using Twisted the wrong way).

This issue has been discussed repeatedly - long story short, it's just a bad idea.

Hmmm, maybe learning Twisted will be harder than I thought.

Update 2008-10-20: Marcin Kasperski wrote a good example comparing raw deferreds, deferred generators, and inline callbacks.

Running functions periodically using Twisted's LoopingCall

2008-10-14T16:00:12-07:00

Twisted is pretty cool-- it is very powerful, but I haven't had the easiest time learning it. Here is a simple example that runs a couple functions periodically (at different rates) using LoopingCall.

For more information, here are the Twisted docs for LoopingCall.

from datetime import datetime
from twisted.internet.task import LoopingCall
from twisted.internet import reactor

def hyper_task():
    print "I like to run fast", datetime.now()

def tired_task():
    print "I want to run slowly", datetime.now()

lc = LoopingCall(hyper_task)
lc.start(0.1)

lc2 = LoopingCall(tired_task)
lc2.start(0.5)

reactor.run()

Results:

I like to run fast 2008-10-14 15:51:02.449537
I want to run slowly 2008-10-14 15:51:02.449915
I like to run fast 2008-10-14 15:51:02.551972
I like to run fast 2008-10-14 15:51:02.652013
I like to run fast 2008-10-14 15:51:02.752006
I like to run fast 2008-10-14 15:51:02.852008
I like to run fast 2008-10-14 15:51:02.952487
I want to run slowly 2008-10-14 15:51:02.952681
I like to run fast 2008-10-14 15:51:03.052012
I like to run fast 2008-10-14 15:51:03.152012
I like to run fast 2008-10-14 15:51:03.252010
I like to run fast 2008-10-14 15:51:03.352009
I like to run fast 2008-10-14 15:51:03.452008
I want to run slowly 2008-10-14 15:51:03.452206
I like to run fast 2008-10-14 15:51:03.552009
I like to run fast 2008-10-14 15:51:03.652013

Using TimerService with twistd

To create a daemon with twistd that achieves the same effect, use TimerService. TimerService runs LoopingCall under the hood. It is meant to be used with the Twisted Application infrastructure. See also the documentation on TimerService.

timerservice_ex.py

from datetime import datetime
from twisted.application import service
from twisted.application.internet import TimerService

def tired_task():
    print "I want to run slowly", datetime.now()

application = service.Application("myapp")
ts = TimerService(0.5, tired_task)
ts.setServiceParent(application)

Run it:

$ twistd -y timerservice_ex.py

Console output is stored in twistd.log:

2010-09-20 18:53:50-0700 [-] Log opened.
2010-09-20 18:53:50-0700 [-] using set_wakeup_fd
2010-09-20 18:53:50-0700 [-] twistd 10.1.0 (/home/saltycrane/.virtualenvs/default/bin/python 2.6.5) starting up.
2010-09-20 18:53:50-0700 [-] reactor class: twisted.internet.selectreactor.SelectReactor.
2010-09-20 18:53:50-0700 [-] I want to run slowly 2010-09-20 18:53:50.896477
2010-09-20 18:53:51-0700 [-] I want to run slowly 2010-09-20 18:53:51.397043
2010-09-20 18:53:51-0700 [-] I want to run slowly 2010-09-20 18:53:51.897087
2010-09-20 18:53:52-0700 [-] I want to run slowly 2010-09-20 18:53:52.397047
2010-09-20 18:53:52-0700 [-] I want to run slowly 2010-09-20 18:53:52.897068
2010-09-20 18:53:53-0700 [-] I want to run slowly 2010-09-20 18:53:53.397073
2010-09-20 18:53:53-0700 [-] I want to run slowly 2010-09-20 18:53:53.897032
2010-09-20 18:53:54-0700 [-] I want to run slowly 2010-09-20 18:53:54.397083

Installing beanstalkd and pybeanstalk on Ubuntu

2008-10-10T10:06:25-07:00

Nikolay pointed me at beanstalkd, a lightweight, message queue service partially inspired by the popular caching system, memcached. It features a blocking "reserve" call so workers don't need to poll for new jobs. However, some might miss the data persistence since the queue is stored in memory. beanstalkd has client libraries in Erlang, Perl, PHP, Python, and Ruby. Of course, I will use the Python version, pybeanstalk. Simon Willison also found beanstalkd interesting. (Credit to him for some of the words above.)

Update 2008-10-13: For a helpful example on using beanstalkd and pybeanstalk see Parand's beanstalkd tutorial.

Here is how I installed the beanstalkd server and pybeanstalk client on Ubuntu. I found no Ubuntu package for beanstalkd, so I installed from source.

Install beanstalkd

Install prerequistes

$ sudo apt-get install libevent1 libevent-dev

Download

$ cd ~/lib
$ wget http://xph.us/software/beanstalkd/rel/beanstalkd-1.0.tar.gz

Unpack
```
$ tar zxvf beanstalkd-1.0.tar.gz
```
Make
```
$ cd beanstalkd-1.0
$ make
```
Print help
```
$ ./beanstalkd -h
```

Install PyYAML

Download

$ wget http://pyyaml.org/download/pyyaml/PyYAML-3.06.tar.gz

Unpack
```
$ tar zxvf PyYAML-3.06.tar.gz
```
Put PyYAML-3.06/lib/yaml somewhere on your Python path or run python setup.py.

Install pybeanstalk

Download

$ wget http://pybeanstalk.googlecode.com/files/pybeanstalk-0.11.1.tar.gz

Unpack
```
$ tar zxvf pybeanstalk-0.11.1.tar.gz
```
Put pybeanstalk-0.11.1/beanstalk somewhere on your Python path or run python setup.py

Run beanstalkd server

$ ~/lib/beanstalkd-1.0/beanstalkd -d -l 127.0.0.5 -p 11300

Run test client

Create a file and run it:

from beanstalk import serverconn
from beanstalk import job

SERVER = '127.0.0.5'
PORT = 11300

# setup connection
connection = serverconn.ServerConn(SERVER, PORT)
connection.job = job.Job

# produce data
for i in range(5):
    print 'put data: %d' % i
    data = job.Job(data=str(i), conn=connection)
    data.Queue()

# consume data
while True:
    j = connection.reserve()
    print 'got data: %s' % j.data
    j.Finish()

Results:

put data: 0
put data: 1
put data: 2
put data: 3
put data: 4
got data: 0
got data: 1
got data: 2
got data: 3
got data: 4

Installing Python 2.6 from source on Ubuntu Hardy

2008-10-02T15:42:03-07:00

Python 2.6 was released yesterday! This version aims to smooth the transition from Python 2.5 to Python 3.0 which is planned for release soon (currently available as a release candidate). Python 3.0 will be break backwards compatibility with the 2.x series. Python 2.6 is backwards compatible with 2.5. All the backwards compatible features of 3.0 have been backported to 2.6.

One of the new 2.6 features I'm particularly intersted in is the new multiprocessing module which has a similar interface to the threading module, but it uses processes instead of threads. This avoids the limitations imposed by the Global Interpreter Lock in a multi-threaded Python program. Yet it still has the nice communications and management features like Pipe()s, Queues, Pools, etc. I didn't plan to focus so much on the multiprocessing module in this post-- I just want to document my install notes on Ubuntu Linux. For all the new features in 2.6 see What's New in Python 2.6. (It is a long list).

Download the Python 2.6 compressed source tarball

$ cd incoming
$ wget http://www.python.org/ftp/python/2.6/Python-2.6.tgz

Unpack
```
$ tar zxvf Python-2.6.tgz
```
Read the README at ~/incoming/Python-2.6/README

Install prerequisites (Disclaimer: I know nothing about libraries, packages, dependencies, etc. This is what I did-- I am not sure if I grabbed the correct prerequisites or not.)

$ sudo apt-get install build-essential
$ sudo apt-get install libncursesw5-dev
$ sudo apt-get install libreadline5-dev
$ sudo apt-get install libssl-dev
$ sudo apt-get install libgdbm-dev
$ sudo apt-get install libbz2-dev
$ sudo apt-get install libc6-dev
$ sudo apt-get install libsqlite3-dev
$ sudo apt-get install tk-dev

Configure. I am installing to ~/lib/python2.6.

$ cd Python-2.6
$ ./configure --prefix=/home/sofeng/lib/python2.6

Make

$ make

Note I got the following message:

Failed to find the necessary bits to build these modules:
bsddb185           sunaudiodev   
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

I was not able to find the Ubuntu packages for these.

Try it out (Optional)

$ ./python

Python 2.6 (r26:66714, Oct  2 2008, 15:32:46) 
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

Test (Optional)

$ make test

I got the following status:

327 tests OK.
33 tests skipped:
    test_aepack test_al test_applesingle test_bsddb185 test_bsddb3
    test_cd test_cl test_codecmaps_cn test_codecmaps_hk
    test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses
    test_gl test_imgfile test_kqueue test_linuxaudiodev test_macos
    test_macostools test_normalization test_ossaudiodev test_pep277
    test_py3kwarn test_scriptpackages test_socketserver test_startfile
    test_sunaudiodev test_timeout test_urllib2net test_urllibnet
    test_winreg test_winsound test_zipfile64
Those skips are all expected on linux2.

Install
```
$ make install
```
Finally, I added ~/lib/python/bin/python2.6 to my PATH

Notes

The first time I ran, I got this message:

Failed to find the necessary bits to build these modules:
_hashlib           _ssl               bsddb185
bz2                gdbm               readline
sunaudiodev
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

I think I found Ubuntu packages for some of them:

_hashlib and _ssl in libssl-dev
bz2 in libbz2-dev
gdbm in libgdbm-dev
readline in libreadline5-dev

Notes on Python deployment using Fabric

2008-09-28T00:24:21-07:00

I found out about Fabric via Armin Ronacher's article Deploying Python Web Applications. Fabric is a Capistrano inspired deployment tool for the Python community. It is very simple to use. There are 4 main commands: local is almost like os.system because it runs a command on the local machine, run and sudo run a command on a remote machine as either a normal user or as root, and put transfers a file to a remote machine.

Here is a sample setup which displays information about the Apache processes on my remote EC2 instance.

Install Easy Install
Install Fabric
```
$ sudo easy_install Fabric
```
Create a file called fabfile.py located at ~/myproject
```
def ec2():
    set(fab_hosts = ['ec2-65-234-55-183.compute-1.amazonaws.com'],
        fab_user = 'sofeng',
        fab_password = 'mypassword',)

def ps_apache():
    run("ps -e -O rss,pcpu | grep apache")
```
Note: for security reasons, you can remove the password from the fabfile and Fabric will prompt for it interactively. Per the documentation, Fabric also supports key-based authentication.

Run it

$ cd ~/myproject
$ fab ec2 ps_apache

Results:

   Fabric v. 0.0.9, Copyright (C) 2008 Christian Vest Hansen.
   Fabric comes with ABSOLUTELY NO WARRANTY; for details type `fab warranty'.
   This is free software, and you are welcome to redistribute it
   under certain conditions; type `fab license' for details.

Running ec2...
Running ps_apache...
Logging into the following hosts as sofeng:
    ec2-65-234-55-183.compute-1.amazonaws.com
[ec2-65-234-55-183.compute-1.amazonaws.com] run: ps -e -O rss,pcpu | grep apache
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  2163  5504  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  2520 15812  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  2521  3664  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  2522  3664  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  2523  3664  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  2524  3664  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  2619  3664  0.0 S ?        00:00:00 /usr/sbin/apache2 -k start
[ec2-65-234-55-183.compute-1.amazonaws.com] out:  2629  1204  0.0 R ?        00:00:00 /bin/bash -l -c ps -e -O rss,pcpu | grep apache
Done.

Python urlparse example

2008-09-24T16:58:32-07:00

Here is an example of how to parse a URL using Python's urlparse module. See the urlparse module documentation for more information.

from urlparse import urlparse

url = 'http://www.gurlge.com:80/path/file.html;params?a=1#fragment'
o = urlparse(url)
print o.scheme
print o.netloc
print o.hostname
print o.port
print o.path
print o.params
print o.query
print o.fragment
print o.username
print o.password

Results:

http
www.gurlge.com:80
www.gurlge.com
80
/path/file.html
params
a=1
fragment
None
None

How to get stdout and stderr using Python's subprocess module

2008-09-23T12:48:00-07:00

I wrote previously about how to get stdout and stderr using os.popen4. However, per the Python documentation, using the subprocess module is preferred:

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. This module intends to replace several other, older modules and functions, such as:

os.system
os.spawn*
os.popen*
popen2.*
commands.*

See the subprocess module documentation for more information.

Here is how to get stdout and stderr from a program using the subprocess module:

from subprocess import Popen, PIPE, STDOUT

cmd = 'ls /etc/fstab /etc/non-existent-file'
p = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
output = p.stdout.read()
print output

Results:

ls: cannot access /etc/non-existent-file: No such file or directory
/etc/fstab

Notes on parallel processing with Python and Twisted

2008-09-12T17:23:17-07:00

Twisted is a networking engine written in Python, that among many other things, can be used to do parallel processing. It is very big, though, so I had a hard time finding what I needed. I browsed through the Twisted Documentation and the Twisted O'Reilly book. There is also a Recipe in the Python Cookbook. However, I found Bruce Eckel's article, Concurrency with Python, Twisted, and Flex to be the most helpful. (See also Bruce Eckel's initial article on Twisted: Grokking Twisted)

Here are my notes on running Bruce Eckel's example. I removed the Flex part because I didn't need or know anything about it. This example runs a Controller which starts a number of separate parallel processes running Solvers (a.ka. workers). It also allows for communication between the Controller and Solvers. Though this example only runs on one machine, the article said extending this to multiple machines is not difficult. For a good explanation of how this works, please see the original article.

Here is solver.py which is copied from the original article. The actual "work" is done in the step method. I only added some debugging print statements for myself.

"""
solver.py
Original version by Bruce Eckel
Solves one portion of a problem, in a separate process on a separate CPU
"""
import sys, random, math
from twisted.spread import pb
from twisted.internet import reactor

class Solver(pb.Root):

    def __init__(self, id):
        print "solver.py %s: solver init" % id
        self.id = id

    def __str__(self): # String representation
        return "Solver %s" % self.id

    def remote_initialize(self, initArg):
        return "%s initialized" % self

    def step(self, arg):
        print "solver.py %s: solver step" % self.id
        "Simulate work and return result"
        result = 0
        for i in range(random.randint(1000000, 3000000)):
            angle = math.radians(random.randint(0, 45))
            result += math.tanh(angle)/math.cosh(angle)
        return "%s, %s, result: %.2f" % (self, str(arg), result)

    # Alias methods, for demonstration version:
    remote_step1 = step
    remote_step2 = step
    remote_step3 = step

    def remote_status(self):
        print "solver.py %s: remote_status" % self.id
        return "%s operational" % self

    def remote_terminate(self):
        print "solver.py %s: remote_terminate" % self.id
        reactor.callLater(0.5, reactor.stop)
        return "%s terminating..." % self

if __name__ == "__main__":
    port = int(sys.argv[1])
    reactor.listenTCP(port, pb.PBServerFactory(Solver(sys.argv[1])))
    reactor.run()

Here is controller.py. This is also copied from the original article but I removed the Flex interface and created calls to start and terminate in the Controller class. I'm not sure if this makes sense, but at least this allowed me to run the example. I also moved the terminate method from the FlexInterface to the Controller.

"""
Controller.py
Original version by Bruce Eckel
Starts and manages solvers in separate processes for parallel processing.
"""
import sys
from subprocess import Popen
from twisted.spread import pb
from twisted.internet import reactor, defer

START_PORT = 5566
MAX_PROCESSES = 2

class Controller(object):

    def broadcastCommand(self, remoteMethodName, arguments, nextStep, failureMessage):
        print "controller.py: broadcasting..."
        deferreds = [solver.callRemote(remoteMethodName, arguments) 
                     for solver in self.solvers.values()]
        print "controller.py: broadcasted"
        reactor.callLater(3, self.checkStatus)

        defer.DeferredList(deferreds, consumeErrors=True).addCallbacks(
            nextStep, self.failed, errbackArgs=(failureMessage))
    
    def checkStatus(self):
        print "controller.py: checkStatus"
        for solver in self.solvers.values():
            solver.callRemote("status").addCallbacks(
                lambda r: sys.stdout.write(r + "\n"), self.failed, 
                errbackArgs=("Status Check Failed"))
                                                     
    def failed(self, results, failureMessage="Call Failed"):
        print "controller.py: failed"
        for (success, returnValue), (address, port) in zip(results, self.solvers):
            if not success:
                raise Exception("address: %s port: %d %s" % (address, port, failureMessage))

    def __init__(self):
        print "controller.py: init"
        self.solvers = dict.fromkeys(
            [("localhost", i) for i in range(START_PORT, START_PORT+MAX_PROCESSES)])
        self.pids = [Popen(["python", "solver.py", str(port)]).pid
                     for ip, port in self.solvers]
        print "PIDS: ", self.pids
        self.connected = False
        reactor.callLater(1, self.connect)

    def connect(self):
        print "controller.py: connect"
        connections = []
        for address, port in self.solvers:
            factory = pb.PBClientFactory()
            reactor.connectTCP(address, port, factory)
            connections.append(factory.getRootObject())
        defer.DeferredList(connections, consumeErrors=True).addCallbacks(
            self.storeConnections, self.failed, errbackArgs=("Failed to Connect"))

        print "controller.py: starting parallel jobs"
        self.start()

    def storeConnections(self, results):
        print "controller.py: storeconnections"
        for (success, solver), (address, port) in zip(results, self.solvers):
            self.solvers[address, port] = solver
        print "controller.py: Connected; self.solvers:", self.solvers
        self.connected = True

    def start(self):
        "controller.py: Begin the solving process"
        if not self.connected:
            return reactor.callLater(0.5, self.start)
        self.broadcastCommand("step1", ("step 1"), self.step2, "Failed Step 1")

    def step2(self, results):
        print "controller.py: step 1 results:", results
        self.broadcastCommand("step2", ("step 2"), self.step3, "Failed Step 2")

    def step3(self, results):
        print "controller.py: step 2 results:", results
        self.broadcastCommand("step3", ("step 3"), self.collectResults, "Failed Step 3")

    def collectResults(self, results):
        print "controller.py: step 3 results:", results
        self.terminate()
        
    def terminate(self):
        print "controller.py: terminate"
        for solver in self.solvers.values():
            solver.callRemote("terminate").addErrback(self.failed, "Termination Failed")
        reactor.callLater(1, reactor.stop)
        return "Terminating remote solvers"

if __name__ == "__main__":
    controller = Controller()
    reactor.run()

To run it, put the two files in the same directory and run python controller.py. You should see 2 CPUs (if you have 2) go up to 100% usage. And here is the screen output:

controller.py: init
PIDS:  [12173, 12174]
solver.py 5567: solver init
solver.py 5566: solver init
controller.py: connect
controller.py: starting parallel jobs
controller.py: storeconnections
controller.py: Connected; self.solvers: {('localhost', 5567): , ('localhost', 5566): }
controller.py: broadcasting...
controller.py: broadcasted
solver.py 5566: solver step
solver.py 5567: solver step
controller.py: checkStatus
solver.py 5566: remote_status
Solver 5566 operational
solver.py 5567: remote_status
controller.py: step 1 results: [(True, 'Solver 5567, step 1, result: 683825.75'), (True, 'Solver 5566, step 1, result: 543177.17')]
controller.py: broadcasting...
controller.py: broadcasted
Solver 5567 operational
solver.py 5566: solver step
solver.py 5567: solver step
controller.py: checkStatus
solver.py 5566: remote_status
Solver 5566 operational
solver.py 5567: remote_status
controller.py: step 2 results: [(True, 'Solver 5567, step 2, result: 636793.90'), (True, 'Solver 5566, step 2, result: 335358.16')]
controller.py: broadcasting...
controller.py: broadcasted
Solver 5567 operational
solver.py 5566: solver step
solver.py 5567: solver step
controller.py: checkStatus
solver.py 5566: remote_status
Solver 5566 operational
solver.py 5567: remote_status
controller.py: step 3 results: [(True, 'Solver 5567, step 3, result: 847386.43'), (True, 'Solver 5566, step 3, result: 512120.15')]
controller.py: terminate
Solver 5567 operational
solver.py 5566: remote_terminate
solver.py 5567: remote_terminate

Notes on starting processes in Python

2008-09-12T15:49:31-07:00

Using `os.fork()`

Here is an example using os.fork() to spawn 5 processes each running the python function, myfunc. Don't forget the os._exit() at the end. Per the docs, normally, sys.exit() is used, but os._exit() can be used in child processes after a fork. It does not call cleanup handlers, flush stdio buffers, etc.

import os
import time

def myfunc(i):
    print "sleeping 5 seconds from process %s" % i
    time.sleep(5)
    print "finished sleeping from process %s" % i

for i in range(5):
    pid = os.fork()
    if pid == 0:
        myfunc(i)
        os._exit(0)

Results:

sleeping 5 seconds from process 0
sleeping 5 seconds from process 1
sleeping 5 seconds from process 2
sleeping 5 seconds from process 3
sleeping 5 seconds from process 4

And 5 seconds later...

finished sleeping from process 0
finished sleeping from process 1
finished sleeping from process 2
finished sleeping from process 3
finished sleeping from process 4

Running an external script in subprocesses

Alternatively, if you want to run an external script in multiple processes, you can use the Popen class in the subprocess module. For example, to run the following script, called "myscript.py":

"myscript.py"
import sys
import time

def myfunc(i):
    print "sleeping 5 seconds from process %s" % i
    time.sleep(5)
    print "finished sleeping from process %s" % i

if __name__ == '__main__':
    myfunc(sys.argv[1])

use the following Python code stored in the same directory:

"popen_ex.py"
from subprocess import Popen

for i in range(5):
    Popen(['python', './myscript.py', str(i)])

The screen output is the same as the previous example. What's the differnce? fork() copies the process memory space including open file descriptors to the child process. In the second example, since I am executing a new Python interpreter from scratch, I get a "cleaner" start but probably more overhead as well.

Simplistic Python Thread example

2008-09-09T23:57:30-07:00

Here is a simple Python example using the Thread object in the threading module.

import time
from threading import Thread

def myfunc(i):
    print "sleeping 5 sec from thread %d" % i
    time.sleep(5)
    print "finished sleeping from thread %d" % i

for i in range(10):
    t = Thread(target=myfunc, args=(i,))
    t.start()

Results:

sleeping 5 sec from thread 0
sleeping 5 sec from thread 1
sleeping 5 sec from thread 2
sleeping 5 sec from thread 3
sleeping 5 sec from thread 4
sleeping 5 sec from thread 5
sleeping 5 sec from thread 6
sleeping 5 sec from thread 7
sleeping 5 sec from thread 8
sleeping 5 sec from thread 9

...and 5 seconds later:

finished sleeping from thread 0
finished sleeping from thread 1
finished sleeping from thread 2
finished sleeping from thread 3
finished sleeping from thread 4
finished sleeping from thread 5
finished sleeping from thread 6
finished sleeping from thread 7
finished sleeping from thread 8
finished sleeping from thread 9

How to iterate over an instance object's data attributes in Python

2008-09-07T15:51:15-07:00

To list the attributes of a Python instance object, I could use the built-in dir() function, however, this will return the instance object's methods as well data attributes. To get just the data attributes, I can use the instance object's __dict__ attribute:

class A(object):
    def __init__(self):
        self.myinstatt1 = 'one'
        self.myinstatt2 = 'two'
    def mymethod(self):
        pass

a = A()
for attr, value in a.__dict__.iteritems():
    print attr, value

Results:

myinstatt2 two
myinstatt1 one

How to conditionally replace items in a list

2008-08-22T12:53:48-07:00

I wanted to replace items in a list based on a specific condition. For example, given a list of numbers, I want to replace all items that are negative with zero.

Naive way

At first, I thought of something like this:

mylist = [111, -222, 333, -444]
newlist = []
for item in mylist:
    if item < 0:
        item = 0
    newlist.append(item)
mylist = newlist
print mylist

Which gave me the expected results:

[111, 0, 333, 0]

Better way?

Then I tried using Python's enumerate (see my previous example) built-in function to replace the item in-line. This seems to be a more elegant solution to me. Is there a better way? How would you do it?

mylist = [111, -222, 333, -444]
for (i, item) in enumerate(mylist):
    if item < 0:
        mylist[i] = 0
print mylist

Results:

[111, 0, 333, 0]

Python recursion example to navigate tree data

2008-08-19T16:43:48-07:00

Here is a simple Python example using recursion to navigate a nested Python data structure. Each node in the data structure contains 0 or more children. In this simple example, I look at each node and print the "text" indented according to the nesting level within the data structure.

Update 2008-09-15: Nihiliad posted an improvement to my example in the comments. It is much simpler. I have updated my example below.

Nihiliad's (improved) method

data = {'count': 2,
        'text': '1',
        'kids': [{'count': 3,
                  'text': '1.1',
                  'kids': [{'count': 1,
                            'text': '1.1.1',
                            'kids': [{'count':0,
                                      'text': '1.1.1.1',
                                      'kids': []}]},
                           {'count': 0,
                            'text': '1.1.2',
                            'kids': []},
                           {'count': 0,
                            'text': '1.1.3',
                            'kids': []}]},
                 {'count': 0,
                  'text': '1.2',
                  'kids': []}]}

def traverse(data):
    print ' ' * traverse.level + data['text']
    for kid in data['kids']:
        traverse.level += 1
        traverse(kid)
        traverse.level -= 1

if __name__ == '__main__':
    traverse.level = 1
    traverse(data)

Results:

 1
  1.1
   1.1.1
    1.1.1.1
   1.1.2
   1.1.3
  1.2

My original (inferior) method

def outer(data):
    class Namespace: pass
    ns = Namespace()
    ns.level = 1
    
    def inner(data):
        print ' ' * ns.level + data['text']
        if data['count'] > 0:
            ns.level += 1
            for kid in data['kids']:
                inner(kid)
            ns.level -= 1

    inner(data)

if __name__ == '__main__':
    outer(data)

On using Python, the Digg API, and simplejson

2008-08-19T11:05:50-07:00

Here are some quick notes on using the Digg API with a Python script. Note, there is a Python toolkit for Digg but I just used urllib2 and the Digg API endpoints for the sake of simplicity.

I wanted the output in JSON format so I specified the response type as JSON. To decode JSON directly to a Python data structure, I used simplejson.

Here is a simple example which returns the JSON output for the Digg story Dell vs. Apple: This Time it's Personal which has a "clean title" of Dell_vs_Apple_This_Time_it_s_Personal.

#!/usr/bin/env python

import urllib2

APPKEY = 'http%3A%2F%2Fwww.example.com'
story_clean_title = 'Dell_vs_Apple_This_Time_it_s_Personal'
url = ''.join([
        'http://services.digg.com',
        '/story/%s' % story_clean_title,
        '?appkey=%s' % APPKEY,
        '&type=json',
        ])
json = urllib2.urlopen(url).read()
print json

Results:

{"timestamp":1219168025,"total":"1","offset":0,"stories":[{"id":"8038250","link":"http:\/\/www.businessweek.com\/magazine\/content\/08_34\/b4097022701166.htm?campaign_id=rss_daily","submit_date":1219047878,"diggs":763,"comments":198,"title":"Dell vs. Apple: This Time it's Personal","description":"Now Bucher is again squaring off against his former company. He's spearheading an ambitious plan at Dell (DELL) to break Apple's dominant hold on the digital entertainment market.","promote_date":1219095692,"status":"popular","media":"news","user":{"name":"msaleem","icon":"http:\/\/digg.com\/users\/msaleem\/l.png","registered":1126518985,"profileviews":136052,"fullname":"Muhammad Saleem"},"topic":{"name":"Apple","short_name":"apple"},"container":{"name":"Technology","short_name":"technology"},"thumbnail":{"originalwidth":370,"originalheight":245,"contentType":"image\/jpeg","src":"http:\/\/digg.com\/apple\/Dell_vs_Apple_This_Time_it_s_Personal\/t.jpg","width":80,"height":80},"href":"http:\/\/digg.com\/apple\/Dell_vs_Apple_This_Time_it_s_Personal"}],"count":1}

Here is a slightly less simple example which returns the comments for the same story above. It uses simplejson to decode the Digg story JSON data and get the story ID which is then used to get the comment data.

#!/usr/bin/env python

import simplejson
import urllib2
from pprint import pprint

APPKEY = 'http%3A%2F%2Fwww.example.com'

def main():
    story_clean_title = 'Dell_vs_Apple_This_Time_it_s_Personal'

    # get story
    json = get_json('/story/%s' % story_clean_title)
    pydata = simplejson.loads(json)
    story_id = pydata['stories'][0]['id']

    # get comments
    json = get_json('/story/%s/comments' % story_id)
    pydata = simplejson.loads(json)
    pprint(pydata)

def get_json(endpoint):
    """ returns json data for requested digg endpoint 
    """
    url = ''.join([
            'http://services.digg.com',
            endpoint,
            '?appkey=%s' % APPKEY,
            '&type=json',
            ])
    return urllib2.urlopen(url).read()

if __name__ == '__main__':
    main()

Results:

{u'comments': [{u'content': u"For those who are having trouble understanding what this is about.  This is not about creating a competing closed platform.  This is about creating a standard platform for selling music online.  Imagine iTunes but with the ability to add other music stores into it.  So when you want to buy a song, you can browse this music store or that music store.  And all will work with your mp3 player.  Whether it is a basic one that mounts as an external usb drive, or one that is able to sync the songs up the way iTunes does it with the iPod.  And we know it is gonna be good because the guy that is driving it worked at Apple, so he knows what quality is.  I'm sure it will have the ability to import or export your music library to iTunes if you so choose.  But the point is that it is trying to create a standard that any device maker can follow.  And hopefully, it will have no drm.  Otherwise it's as useless as iTunes with its drm.\n\nRemember the same thing happened when windows 3.1 came along and immediately sold as many as 10x the number of pc's compared to mac's at the time.  Anytime there is an open standard, no matter how good the closed standard is, the open one wins because that means lower prices for consumers since any manufacturer can use it.  Apple thrives on closed standards when it comes to selling their products because it enables them to lock their customers in.  And before they know it, they're locked into it and can't get out of it without great expense.\n\nI know that the Apple fanboi's are gonna bury me for this, but I'll say it anyways.  It was nice being on top.  But you can't be on top forever.",
                u'date': 1219163527,
                u'down': 0,
                u'id': 17963749,
                u'level': 0,
                u'replies': 0,
                u'replyto': None,
                u'root': 17963749,
                u'story': 8038250,
                u'up': 1,
                u'user': u'pyrates'},
               {u'content': u'@thinkdifferent: I should of specified its a $1 cheaper when you buy the full album. Single tracks are the same price but albums are usually about $8.99. ',
                u'date': 1219159148,
                u'down': 0,
                u'id': 17961525,
                u'level': 0,
                u'replies': 0,
                u'replyto': None,
                u'root': 17961525,
                u'story': 8038250,
                u'up': 1,
                u'user': u'mrgermy'},
               {u'content': u'Zune XPS w/ Vista Ultimate combo deal.',
                u'date': 1219146354,
                u'down': 0,
                u'id': 17956943,
                u'level': 0,
                u'replies': 0,
                u'replyto': None,
                u'root': 17956943,
                u'story': 8038250,
                u'up': 1,
                u'user': u'hurdboy'},
               {u'content': u'Digg: "Dell vs. Apple: This Time it\'s Personal"\nBusiness Week: "Bucher says his quest to challenge Apple is all business and not personal."\n\nBurried as inacurate.',
                u'date': 1219134278,
                u'down': 0,
                u'id': 17954900,
                u'level': 0,
                u'replies': 0,
                u'replyto': None,
                u'root': 17954900,
                u'story': 8038250,
                u'up': 1,
                u'user': u'KAMiKAZOW'},
               {u'content': u'Dell is junk and Apple is overpriced',
                u'date': 1219117388,
                u'down': 0,
                u'id': 17950148,
                u'level': 0,
                u'replies': 0,
                u'replyto': None,
                u'root': 17950148,
                u'story': 8038250,
                u'up': 1,
                u'user': u'DeuceDiggalow'},
               {u'content': u'Dell, all you have to do to kick apple in the jewels is get your act together with Ubuntu.\n\nAmarok will do the rest.',
                u'date': 1219112434,
                u'down': 2,
                u'id': 17948056,
                u'level': 0,
                u'replies': 4,
                u'replyto': None,
                u'root': 17948056,
                u'story': 8038250,
                u'up': 3,
                u'user': u'ethana2'},
               {u'content': u'Apple = Overpriced, under-featured, but pretty\nDell = priced right, full-featured, and awesome looking',
                u'date': 1219105961,
                u'down': 8,
                u'id': 17945330,
                u'level': 0,
                u'replies': 2,
                u'replyto': None,
                u'root': 17945330,
                u'story': 8038250,
                u'up': 3,
                u'user': u'freesf'},
               {u'content': u'Go Dell! Best of luck to you :)',
                u'date': 1219104145,
                u'down': 0,
                u'id': 17944493,
                u'level': 0,
                u'replies': 0,
                u'replyto': None,
                u'root': 17944493,
                u'story': 8038250,
                u'up': 0,
                u'user': u'Sabre24q7'},
               {u'content': u'Actually the Dell DJ wasnt that bad for its time. Obviously now its looks very dated and far from the best now, but at its time it was a decent player and had pretty good sound quality. It also had more features than the ipod does even today.\n\n Creative was the people who made it and it was a very basic player but it did what it was suppose to and had a lot of room on it for a cheap price. ',
                u'date': 1219104085,
                u'down': 2,
                u'id': 17944474,
                u'level': 0,
                u'replies': 1,
                u'replyto': None,
                u'root': 17944474,
                u'story': 8038250,
                u'up': 1,
                u'user': u'jsc315'},
               {u'content': u"I'm not saying this will or won't succeed. I'm just pointing out something people seem to be missing.\n\nJust because Dell customers aren't pompous and loudmouthed doesn't mean they aren't loyal and happy customers. There are a LOT of Dell users that like the company and what the products they make.",
                u'date': 1219102724,
                u'down': 1,
                u'id': 17943861,
                u'level': 0,
                u'replies': 1,
                u'replyto': None,
                u'root': 17943861,
                u'story': 8038250,
                u'up': 2,
                u'user': u'Urkel'}],
 u'count': 10,
 u'offset': 0,
 u'timestamp': 1219168299,
 u'total': u'55'}

Somewhere on your Python path

2008-08-05T15:11:00-07:00

As I install new python packages, I sometimes see instructions which say something like "check out the code, and place it somewhere on your Python path". These are very simple instructions, but since it is not automatic like a Windows installer, or Ubuntu's package management system, it causes me to pause. Where on my Python path should I put it? I could put all my packages in random places and update my PYTHONPATH environment variable every time. I also thought about putting new packages in Python's site-packages directory. This is probably a good option. However, I tend to like to have all my important stuff in my home directory so I can easier maintain it across multiple machines. (Also, I forget where the site-packages lives (it is /usr/lib/python2.5/site-packages on Ubuntu Hardy).) So my solution was to create my own python-packages directory in ~/lib. I set the PYTHONPATH in my ~/.bashrc as follows:

export PYTHONPATH=$HOME/lib/python-packages

and then put all my Python packages here.

Update: Some Python packages are distributed with other stuff besides the actual Python package. To handle these cases, I created a dist directory inside my python-packages directory, and created symbolic links from the actual package directory in dist to python-packages. Finally, I made the entire python-packages directory a Mercurial repository so I can finely control my Python environment and easily maintain it across multiple machines. Here's what my ~/lib/python-packages currently looks like:

drwxr-xr-x 9 sofeng sofeng 4096 2008 08/05 21:40 dist
drwxr-xr-x 4 sofeng sofeng 4096 2008 08/05 21:31 django_openidconsumer
lrwxrwxrwx 1 sofeng sofeng   43 2008 08/05 21:31 elementtree -> dist/elementtree-1.2.6-20050316/elementtree
lrwxrwxrwx 1 sofeng sofeng   31 2008 08/05 21:31 openid -> dist/python-openid-1.2.0/openid
lrwxrwxrwx 1 sofeng sofeng   31 2008 08/05 21:31 openid2.2 -> dist/python-openid-2.2.1/openid
lrwxrwxrwx 1 sofeng sofeng   27 2008 08/05 21:41 pygments -> dist/Pygments-0.10/pygments
lrwxrwxrwx 1 sofeng sofeng   29 2008 08/05 21:31 urljr -> dist/python-urljr-1.0.1/urljr
lrwxrwxrwx 1 sofeng sofeng   29 2008 08/05 21:31 yadis -> dist/python-yadis-1.1.0/yadis

Update 2008-09-14: Here is a post on the django-developers mailing list by Kevin Teague which explains the large number of technologies related to Python package management and deployment including PyPi, Distutils, Eggs, Easy Install, VirtualEnv, and Buildout. Kevin admits that package management and deployment is an area in Python where there is room for a great deal of improvemnt. He notes that the symlinking method that I use can work for simple needs, but it fails for more complicated use cases, such as tracking package dependencies. The new Virtualenv and Buildout technologies seem to be interesting-- I will have to check them out when I have time. I found this link via Simon Willison

Update 2008-09-24: Ian Bicking, author of Virtualenv, has just released pyinstall which seems to be an improved easy_install. I have not tried it yet, but I believe Ian Bicking writes good code.

Update 2008-10-24: Glyph Lefkowitz, lead architect of Twisted, suggests using twisted.python.modules for solving Python Path Programming Problems. This looks like something I could use in my current project-- I just wish I understood it.

Update 2008-12-16:

James Bennett (Django release manager): Please, for the love of Guido, stop using setuptools and easy_install, and use distutils and pip instead. (pip is the new name for pyinstall, by Ian Bicking.)
Ian Bicking: A Few Corrections To “On Packaging”

How to use gnip-python to retrieve activity from Twitter, Delicious, Digg, etc.

2008-08-01T16:09:38-07:00

Create an account at http://www.gnipcentral.com/
Download gnip-python from github.com.

Unpack it:

$ tar -zxvf gnip-gnip-python-028364a70bd40dda0069ecdd3e7f6fff23bb985e.tar.gz

Move it to your example directory:

$ mkdir ~/src/python/gnip-example
$ mv gnip-gnip-python-028364a70bd40dda0069ecdd3e7f6fff23bb985e/*.py ~/src/python/gnip-example

Create an example file called ~/src/python/gnip-example/gnip-example.py:

#!/usr/bin/env python

from gnip import *

gnip = Gnip("yourgniplogin@email.com", "yourpassword")

for publisher in ["twitter", "digg", "delicious"]:
    activities = gnip.get_publisher_activities(publisher)
    print
    print publisher
    for activity in activities[:5]:
        print activity

Run it:

$ python gnip-example.py

And get the following results:

twitter
[derricklo, 2008-08-01T22:49:59+00:00, tweet, http://twitter.com/derricklo/statuses/875165550]
[sam_metal, 2008-08-01T22:50:01+00:00, tweet, http://twitter.com/sam_metal/statuses/875165564]
[lalatina, 2008-08-01T22:49:59+00:00, tweet, http://twitter.com/lalatina/statuses/875165544]
[Nochipra, 2008-08-01T22:50:01+00:00, tweet, http://twitter.com/Nochipra/statuses/875165562]
[jmcgaha, 2008-08-01T22:50:01+00:00, tweet, http://twitter.com/jmcgaha/statuses/875165556]

digg
[DAlexopoulos, 2008-08-01T18:50:08+00:00, submission, http://digg.com/health/CLA_Conjugated_Linoleic_Acid_Explained]
[EradicateIV, 2008-08-01T18:50:05+00:00, submission, http://digg.com/pc_games/Pittco_Iron_Storm_9]
[vivianpetman, 2008-08-01T18:49:58+00:00, submission, http://digg.com/business_finance/Dub_Me_Now_Turns_Business_Cards_into_Bits_and_Bytes_Small]
[portia7896, 2008-08-01T18:49:53+00:00, submission, http://digg.com/world_news/Radioactive_water_yum]
[hhdepot, 2008-08-01T18:49:52+00:00, submission, http://digg.com/2008_us_elections/Why_WE_are_democrats_in_2008_Original_Video]

delicious
[hanasama1, 2008-08-01T22:49:55+00:00, delicious, http://www.birchmere.com/]
[Chrmftcotl, 2008-08-01T22:49:54+00:00, delicious, http://www.shaunlow.com/a-definitive-stumbleupon-guide-driving-traffic-to-websites/]
[shankar, 2008-08-01T22:49:53+00:00, delicious, http://www.guardian.co.uk/books/2008/jul/26/salmanrushdie.bookerprize]
[grzyweasel, 2008-08-01T22:49:53+00:00, delicious, http://patterntap.com/]
[metaffect, 2008-08-01T22:50:05+00:00, delicious, http://dean.edwards.name/]

Pretty cool.

This is just scratching the surface of what you can do with gnip. You can also filter by time or user. Or get XML output. Or you can publish to gnip yourself. See the gnip-python README for more python examples or the Gnip API for more detailed information. Also, here is a list of Gnip publishers.

If you get a `ImportError: No module named iso8601` error, install iso8601.

Intall Python Easy Install
Install iso8601:
```
$ sudo easy_install -U iso8601
```

How to search C code for division or sqrt

2008-07-24T15:12:20-07:00

The following Python script searches through C code for division or sqrt and prints the line of code and the line number. It skips C comments. To use, run python find_divides.py filename.c

#!/usr/bin/python

"""find_divides.py

usage: python find_divides.py filename
"""

import re
import sys

def main():
    filename = sys.argv[1]
    text = open(filename).read()
    lines = text.splitlines()
    lines = ["%4d: %s" % (i, line) for (i, line) in enumerate(lines)]
    text = "\n".join(lines)
    text = remove_comments_and_strings(text)

    for line in text.splitlines():
        if ("/" in line) or ("sqrt" in line):
            print line

def remove_comments_and_strings(text):
    """ remove c-style comments and strings
        text: blob of text with comments (can include newlines)
        returns: text with comments and strings removed
    """
    pattern = r"""
                            ##  --------- COMMENT ---------
           /\*              ##  Start of /* ... */ comment
           [^*]*\*+         ##  Non-* followed by 1-or-more *'s
           (                ##
             [^/*][^*]*\*+  ##
           )*               ##  0-or-more things which don't start with /
                            ##    but do end with '*'
           /                ##  End of /* ... */ comment
         |                  ##  -OR-  various things which aren't comments:
           (                ## 
                            ##  ------ " ... " STRING ------
             "              ##  Start of " ... " string
             (              ##
               \\.          ##  Escaped char
             |              ##  -OR-
               [^"\\]       ##  Non "\ characters
             )*             ##
             "              ##  End of " ... " string
           |                ##  -OR-
                            ##
                            ##  ------ ' ... ' STRING ------
             '              ##  Start of ' ... ' string
             (              ##
               \\.          ##  Escaped char
             |              ##  -OR-
               [^'\\]       ##  Non '\ characters
             )*             ##
             '              ##  End of ' ... ' string
           |                ##  -OR-
                            ##
                            ##  ------ ANYTHING ELSE -------
             (.              ##  Anything other char
             [^/"'\\]*)      ##  Chars which doesn't start a comment, string
           )                ##    or escape
    """
    regex = re.compile(pattern, re.VERBOSE|re.MULTILINE|re.DOTALL)
    goodstuff = [m.group(5) for m in regex.finditer(text) if m.group(5)]
    return "".join(goodstuff)

if __name__ == "__main__":
    main()

Django Blog Project #9: Migrating Blogger posts with Beautiful Soup

2008-07-14T22:47:00-07:00

Last post, I talked about adding comments to my new sample blog application. This was about the last basic feature I needed to add before I started actually using it for real. Of course there are still a number of features I'd like to add, such as automatic syntax highlighting with Pygments, and incorporating django-tagging and some more intersting views, not to mention comment moderation. But I think those will have to wait-- I want to start using my new blog for real sometime.

So for the past few days, I've been working on my Beautiful Soup screen scraper script to copy all my Blogger posts over to my new Django blog. Initial results came quickly (it's pretty cool to see such a huge data dump after only a few lines of Beautiful Soup'ing) but the details (especially with the comments) kind of slowed me down. I've finally got everything copied over to my satisfaction. Below is the script I used to do it. Note, I realize it's not pretty-- just a one time use hack. But hopefully someone else doing the same thing might find it useful.

#!/usr/bin/env python

import datetime
import os
import re
import urllib2
from BeautifulSoup import BeautifulSoup
from myblogapp.models import Post, LegacyComment
from django.contrib.comments.models import FreeComment

URL = ''.join([
        'http://iwiwdsmi.blogspot.com/search?',
        'updated-min=2006-01-01T00%3A00%3A00-08%3A00&'
        'updated-max=2009-01-01T00%3A00%3A00-08%3A00&',
        'max-results=1000'
        ])
html = urllib2.urlopen(URL).read()
soup = BeautifulSoup(html)

for post in soup.html.body.findAll('div', {'class': 'post'}):
    print
    print '--------------------------------------------------------------'

    # save the post title and permalink
    h3 = post.find('h3', {'class': 'post-title'})
    post_href = h3.find('a')['href']
    post_title = h3.find('a').string
    post_slug = os.path.basename(post_href).rstrip('.html')
    print post_slug
    print post_href
    print post_title

    # save the post body
    div = post.find('div', {'class': 'post-body'})
    [toremove.extract() for toremove in div.findAll('script')]
    [toremove.extract() for toremove in div.findAll('span', {'id': 'showlink'})]
    [toremove.extract() for toremove in div.findAll('div', {'style': 'clear: both;'})]
    [toremove.parent.extract() for toremove in div.findAll(text='#fullpost{display:none;}')]
    post_body = ''.join([str(item)
                         for item in div.contents
                         ]).rstrip()
    post_body = re.sub(r"iwiwdsmi\.blogspot\.com/(\d{4}/\d{2}/[\w\-]+)\.html", 
                       r"www.saltycrane.com/blog/\1/", 
                       post_body)

    # count number of highlighted code sections 
    highlight = div.findAll('div', {'class': 'highlight'})
    if highlight:
        hl_count += len(highlight)
        hl_list.append(post_title)

    # save the timestamp
    a = post.find('a', {'class': 'timestamp-link'})
    try:
        post_timestamp = a.string
    except:
        match = re.search(r"\.com/(\d{4})/(\d{2})/", post_href)
        if match:
            year = match.group(1)
            month = match.group(2)
        post_timestamp = "%s/01/%s 11:11:11 AM" % (month, year)
    print post_timestamp

    # save the tags (this is ugly, i know)
    if 'error' in post_title.lower():
        post_tags = ['error']
    else:
        post_tags = []
    span = post.find('span', {'class': 'post-labels'})
    if span:
        a = span.findAll('a', {'rel': 'tag'})
    else:
        a = post.findAll('a', {'rel': 'tag'})
    post_tags = ' '.join([tag.string for tag in a] + post_tags)
    if not post_tags:
        post_tags = 'untagged'
    print post_tags

    # add Post object to new blog
    if True:
        p = Post()
        p.title = post_title
        p.body = post_body
        p.date_created = datetime.datetime.strptime(post_timestamp, "%m/%d/%Y %I:%M:%S %p")
        p.date_modified = p.date_created
        p.tags = post_tags
        p.slug = post_slug
        p.save()

    # check if there are comments
    a = post.find('a', {'class': 'comment-link'})
    if a:
        comm_string = a.string.strip()
    else:
        comm_string = "0"
    if comm_string[0] != "0":
        print
        print "COMMENTS:"

        # get the page with comments
        html_single = urllib2.urlopen(post_href).read()
        soup_single = BeautifulSoup(html_single)

        # get comments
        comments = soup_single.html.body.find('div', {'class': 'comments'})
        cauth_list = comments.findAll('dt')
        cbody_list = comments.findAll('dd', {'class': 'comment-body'})
        cdate_list = comments.findAll('span', {'class': 'comment-timestamp'})

        if not len(cauth_list)==len(cbody_list)==len(cdate_list):
            raise "didn't get all comment data"

        for auth, body, date in zip(cauth_list, cbody_list, cdate_list):
            
            # create comment in database
            lc = LegacyComment()
            lc.body = str(body.p)

            # find author
            lc.author = "Anonymous"
            auth_a = auth.findAll('a')[-1]
            auth_no_a = auth.contents[2]
            if auth_a.string:
                lc.author = auth_a.string
            elif auth_no_a:
                match = re.search(r"\s*([\w\s]*\w)\s+said", str(auth_no_a))
                if match:
                    lc.author = match.group(1)
            print lc.author

            # find website
            try:
                lc.website = auth_a['href']
            except KeyError:
                lc.website = ''
            print lc.website

            # other info
            lc.date_created = datetime.datetime.strptime(
                date.a.string.strip(), "%m/%d/%Y %I:%M %p")
            print lc.date_created
            lc.date_modified = lc.date_created
            lc.post_id = p.id
            lc.save()

I also made some changes to my Django blog code as I migrated my Blogger posts. The main addition was a LegacyComment model along with the associated views and templates. My Blogger comments consisted of HTML markup, but I didn't want to allow arbitrary HTML in my new comments for fear of cross site scripting. So I separated my legacy Blogger comments from my new Django site comments.

models.py

Here are my model changes. I added a LegacyComment class which contains pertinent comment attributes and a ForeignKey to the post that it belongs to. I also added a lc_count (for legacy comment count) field to the Post class which stores the number of comments for the post. It is updated by the save() method in the LegacyComment class every time a comment is saved. Hmmm, I just realized the count will be wrong if I ever edit these comments. Well, since these are legacy comments, hopefully I won't have to edit them.

~/src/django/myblogsite/myblogapp/models.py:

import re
from django.db import models

class Post(models.Model):
    title = models.CharField(maxlength=200)
    slug = models.SlugField(maxlength=100)
    date_created = models.DateTimeField() #auto_now_add=True)
    date_modified = models.DateTimeField()
    tags = models.CharField(maxlength=200)
    body = models.TextField()
    body_html = models.TextField(editable=False, blank=True)
    lc_count = models.IntegerField(default=0, editable=False)

    def get_tag_list(self):
        return re.split(" ", self.tags)

    def get_absolute_url(self):
        return "/blog/%d/%02d/%s/" % (self.date_created.year,
                                      self.date_created.month,
                                      self.slug)

    def __str__(self):
        return self.title

    class Meta:
        ordering = ["-date_created"]

    class Admin:
        pass

class LegacyComment(models.Model):
    author = models.CharField(maxlength=60)
    website = models.URLField(core=False)
    date_created = models.DateTimeField()
    date_modified = models.DateTimeField()
    body = models.TextField()
    post = models.ForeignKey(Post)

    def save(self):
        p = Post.objects.get(id=self.post.id)
        p.lc_count += 1
        p.save()
        super(LegacyComment, self).save()

    class Meta:
        ordering = ["date_created"]

    class Admin:
        pass

views.py

Here is an excerpt from my views.py file showing the changes:

~/src/django/myblogsite/myblogapp/views.py:

import re
from datetime import datetime
from django.shortcuts import render_to_response
from myblogsite.myblogapp.models import Post, LegacyComment

MONTH_NAMES = ('', 'January', 'Feburary', 'March', 'April', 'May', 'June', 'July',
               'August', 'September', 'October', 'November', 'December')
MAIN_TITLE = "Sofeng's Blog 0.0.7"

def frontpage(request):
    posts, pagedata = init()
    posts = posts[:5]
    pagedata.update({'post_list': posts,
                     'subtitle': '',})
    return render_to_response('listpage.html', pagedata)

def singlepost(request, year, month, slug2):
    posts, pagedata = init()
    post = posts.get(date_created__year=year,
                            date_created__month=int(month),
                            slug=slug2,)
    legacy_comments = LegacyComment.objects.filter(post=post.id)
    pagedata.update({'post': post,
                     'lc_list': legacy_comments,})
    return render_to_response('singlepost.html', pagedata)

Templates

In the list page template I used the truncatewords_html template filter to show a 50 word post summary on the list pages instead of the full post. I also added the legacy comment count with the Django free comment count to display the total number of comments.

Excerpt from ~/src/django/myblogsite/templates/listpage.html:

{% block main %}
  <br>
  {% for post in post_list %}
    <h4><a href="/blog/{{ post.date_created|date:"Y/m" }}/{{ post.slug }}/">
        {{ post.title }}</a>
    </h4>
    {{ post.body|truncatewords_html:"50" }}
    <a href="{{ post.get_absolute_url }}">Read more...</a><br>
    <br>
    <hr>
    <div class="post_footer">
      {% ifnotequal post.date_modified.date post.date_created.date %}
        Last modified: {{ post.date_modified.date }}<br>
      {% endifnotequal %}
      Date created: {{ post.date_created.date }}<br>
      Tags: 
      {% for tag in post.get_tag_list %}
        <a href="/blog/tag/{{ tag }}/">{{ tag }}</a>{% if not forloop.last %}, {% endif %}
      {% endfor %}
      <br>

      {% get_free_comment_count for myblogapp.post post.id as comment_count %}
      <a href="{{ post.get_absolute_url }}#comments">
        {{ comment_count|add:post.lc_count }} 
        Comment{{ comment_count|add:post.lc_count|pluralize}}</a>

    </div>
    <br>
  {% endfor %}
{% endblock %}

In the single post template, I added the display of the Legacy comments in addition to the Django free comments.

Excerpt from ~/src/django/myblogsite/templates/singlepost.html:

 <a name="comments"></a>
  {% if lc_list %}
    <h4>{{ lc_list|length }} Legacy Comment{{lc_list|length|pluralize}}</h4>
  {% endif %}
  {% for legacy_comment in lc_list %}
    <br>
    <a name="lc{{ legacy_comment.id }}" href="#lc{{ legacy_comment.id }}">
      #{{ forloop.counter }}</a>
    {% if legacy_comment.website %}
      <a href="{{ legacy_comment.website }}">
        <b>{{ legacy_comment.author|escape }}</b></a> 
    {% else %}
      <b>{{ legacy_comment.author|escape }}</b>
    {% endif %}
    commented,
    on {{ legacy_comment.date_created|date:"F j, Y" }} 
    at {{ legacy_comment.date_created|date:"P" }}:
    {{ legacy_comment.body }}
  {% endfor %}
  <br>

That's it. Hopefully, I can start using my new blog soon. Please browse around on the new Django site and let me know if you run across any problems. When everything looks to be OK, I'll start posting only on my new Django site.

Here is a snapshot screenshot of version 0.0.8:

The live site can be viewed at: http://saltycrane.com/blog

How to get the current date and time in Python

2008-06-26T13:06:00-07:00

Here is an example of how to get the current date and time using the datetime module in Python:

import datetime

now = datetime.datetime.now()

print
print "Current date and time using str method of datetime object:"
print str(now)

print
print "Current date and time using instance attributes:"
print "Current year: %d" % now.year
print "Current month: %d" % now.month
print "Current day: %d" % now.day
print "Current hour: %d" % now.hour
print "Current minute: %d" % now.minute
print "Current second: %d" % now.second
print "Current microsecond: %d" % now.microsecond

print
print "Current date and time using strftime:"
print now.strftime("%Y-%m-%d %H:%M")

print
print "Current date and time using isoformat:"
print now.isoformat()

Results:

Current date and time using str method of datetime object:
2014-09-26 16:34:40.278298

Current date and time using instance attributes:
Current year: 2014
Current month: 9
Current day: 26
Current hour: 16
Current minute: 34
Current second: 40
Current microsecond: 278298

Current date and time using strftime:
2014-09-26 16:34

Current date and time using isoformat:
2014-09-26T16:34:40.278298

Directly from the time module documentation, here are more options to use with strftime:

Directive	Meaning	Notes
`%a`	Locale's abbreviated weekday name.
`%A`	Locale's full weekday name.
`%b`	Locale's abbreviated month name.
`%B`	Locale's full month name.
`%c`	Locale's appropriate date and time representation.
`%d`	Day of the month as a decimal number [01,31].
`%H`	Hour (24-hour clock) as a decimal number [00,23].
`%I`	Hour (12-hour clock) as a decimal number [01,12].
`%j`	Day of the year as a decimal number [001,366].
`%m`	Month as a decimal number [01,12].
`%M`	Minute as a decimal number [00,59].
`%p`	Locale's equivalent of either AM or PM.	(1)
`%S`	Second as a decimal number [00,61].	(2)
`%U`	Week number of the year (Sunday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Sunday are considered to be in week 0.	(3)
`%w`	Weekday as a decimal number [0(Sunday),6].
`%W`	Week number of the year (Monday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Monday are considered to be in week 0.	(3)
`%x`	Locale's appropriate date representation.
`%X`	Locale's appropriate time representation.
`%y`	Year without century as a decimal number [00,99].
`%Y`	Year with century as a decimal number.
`%Z`	Time zone name (no characters if no time zone exists).
`%%`	A literal "`%`" character.

/usr/bin/python: bad interpreter: Permission denied error

2008-05-12T09:59:00-07:00

I have a Python script, myscript.py with a #!/usr/bin/python shebang* at the top and tried to execute it on Ubuntu Linux using ./myscript.py. I got the following error message:

bash:  ./myscript.py: /usr/bin/python: bad interpreter: Permission  denied

Here are things to check:

The file should be executable (use chmod +x myscript.py)
The file shoud have Unix line endings
The file shouldn't be on a fat32 or ntfs filesystem. Apparently, bash can't handle scripts that are stored on fat32 or ntfs
The shebang line must be less than 128 characters (on Linux)

UPDATE 2015-09-23: 7 years later, I ran into this error again on our Jenkins server... A shebang line that is too long will also cause this error. On Linux, the maximum length is 128 characters. See:

* #!/usr/bin/env python would be the more portable shebang.

Django Blog Project #1: Creating a basic blog

2008-05-09T23:41:00-07:00

It's been a while since my last post on Django. I became very busy ~~but also found the Django tutorial to be somewhat dry~~and also somewhat lazy*. Luckily, the official Django Book was published during this time and it is more interesting to read. This post will be a very brief summary of the first 6 chapters of the Django Book as I apply it towards the creation of a new blog website. I highly recommend reading the book (I forgot to mention it is available free online). Then after reading the first six chapters, I hope this post can serve as kind of a refresher on how to put everything together.

As I mentioned, I decided to create my own blog site as my first Django project. I know it is not the most original idea in the world, but I thought it would be useful and a good learning experience. The following steps are my first cut at my new blog (dubbed 0.0.1) and basically just document the basic concepts of Django without providing much actual functionality.

I develop a model, a template, and a view in accordance with Django's MTV (see also MVC) development pattern. The model is of a blog post (aptly named Post) and contains only one attribute, the post body data (i.e. the actual text of the post). I should add in other data such as the title, date, tags, etc. But in order to keep things simple this first time around, it just has the post body. The model is connected to a SQLite database and is updated using Django's excellent admin interface. Finally, the model data is combined with a very basic template which just displays a title (My New Blog Version 0.0.1), and all the blog post bodies separated by a <hr>. Like I said, it's not very useful at this point, but I think I understand the basic concepts and how to put everything together much better now. The next step in my Django development will be to create some more interesting templates and views and add more useful data like titles and dates.

I also have a couple of related plans:

Set up hosting: I've decided to use WebFaction for my hosting but I need to set up and upload my new, almost-website there. This will probably be the subject of my next Django post.
Copy my Blogger posts over to my new site. I've already figured out how to use Beautiful Soup to screen scrape my Blogger posts and import them into SQLite. Likely I will do this further on in the process.

Here are the steps I took for my first cut at my new blog. Note, I'm running on Ubuntu so that's why I have the $ bash prompt and use /home/sofeng paths in my examples.

*Update 2008-10-09: I realize my assessment of the tutorial might have sounded critical-- in actuality, since I was job searching at the time, I was just trying to avoid sounding lazy. For the record, I find the Django documentation to be excellent and one of the highlights of the project.

Create a new project

The first thing to do after installing Django is to create a new project. Luckily, it takes just one command to create the project.

Create a new project

$ cd ~/src/django
$ django-admin.py startproject myblogsite

Take a look at the site using the development server
```
$ python manage.py runserver
```
Then go to http://127.0.0.1:8000

At first I thought the admin interface was kind of boring. However, for my blog site, I will use the admin interface to enter new blog posts.

Edit myblogsite/settings.py to add the admin application to the list of installed apps:

INSTALLED_APPS = (
   'django.contrib.auth',
   'django.contrib.contenttypes',
   'django.contrib.sessions',
   'django.contrib.sites',
   'django.contrib.admin',
)

Install database tables for the admin interface:

$ python manage.py syncdb

At this point I was asked to create a superuser to log into the admin interface. I answered "yes" and filled in the appropriate information.

Creating table auth_message
Creating table auth_group
Creating table auth_user
Creating table auth_permission
Creating table django_content_type
Creating table django_session
Creating table django_site
Creating table django_admin_log

You just installed Django's auth system, which means you don't have any superusers defined.
Would you like to create one now? (yes/no): yes
Username (Leave blank to use 'sofeng'): sofeng
E-mail address: sofeng@sofeng.com
Password:
Password (again):
Superuser created successfully.
Installing index for auth.Message model
Installing index for auth.Permission model
Installing index for admin.LogEntry model
Loading 'initial_data' fixtures...
No fixtures found.

Edit myblogsite/urls.py to include the admin url.

from django.conf.urls.defaults import *

urlpatterns = patterns('',
   (r'^admin/', include('django.contrib.admin.urls')),
)

Run the development server:
```
$ python manage.py runserver
```
Then go to http://127.0.0.1:8000/admin Log in and take a look around.

Set up the SQLite3 database

I chose SQLite because it is a lightweight, simple alternative to MySQL or PostgreSQL. This makes it great for a development website.

Edit the following section in the myblogsite/settings.py file:
```
DATABASE_ENGINE = 'sqlite3'
DATABASE_NAME = '/home/sofeng/src/django/myblogsite/mydatabase.sqlite3'
```
The rest of the DATABASE_ variables are not used with SQLite.
Test out the database configuration: Run the shell:
```
$ python manage.py shell
```
Then type these commands:
```
>>> from django.db import connection
>>> cursor = connection.cursor()
```
If nothing happens, all is good. See Table 5-2 in Chapter 5 of the Django Book common database configuration error messages.

Create an App

I think of an "app" as a piece of specific functionality of a website, whereas a project corresponds to a particular website. There can be many apps in a project. Also, apps can be used in more than one project. For more information about the differences between projects and apps see Chapter 5 of the Django Book.

Create an app

$ cd ~/src/django/myblogsite
$ python manage.py startapp myblogapp

Create a Model

I created one model, the Post model. A model roughly corresponds to a SQL table. And each attribute in that model corresponds to a table row. I added the class Admin: so that my Post model would show up in the Admin interface (where I can insert the data).

Edit myblogsite/myblogapp/models.py to look like the following:

from django.db import models

class Post(models.Model):
   body = models.TextField()

   # in the future I will add these other attributes
#    title = models.CharField(maxlength=500)
#    timestamp = models.CharField(maxlength=50)
#    tags = models.CharField(maxlength=200)

   class Admin:
       pass

Correction 7/6/2008: For the Post's body field, I previously used the line: body = models.CharField(maxlength=999999). However, thanks to Myles's comment in my post #4, I've changed this to use the more appropriate TextField.

Install the Model

After writing the Python model code, I needed to create the actual tables in the SQLite database. The following steps include a couple of checks, then I create the tables in the last step.

Edit myblogsite/settings.py file again and add the blog app to the list of installed apps:

INSTALLED_APPS = (
   'django.contrib.auth',
   'django.contrib.contenttypes',
   'django.contrib.sessions',
   'django.contrib.sites',
   'myblogsite.myblogapp',
)

Try validating the model:
```
$ python manage.py validate
```
Which gives the following message:
```
0 errors found.
```
Check the CREATE TABLE statements that Django will generate. Note, the database won't be modified.
```
$ python manage.py sqlall myblogapp
```
Which yields the following:
```
BEGIN;
CREATE TABLE "myblogapp_post" (
   "id" integer NOT NULL PRIMARY KEY,
   "body" text NOT NULL
);
COMMIT;
```
Correction 7/6/2008: I've updated the results here to reflect the correction I made to the model above.

Now, actually create the tables in SQLite:

$ python manage.py syncdb

Which yields something like this:

Creating table blog_post
Loading 'initial_data' fixtures...
No fixtures found.

Create some new data using the admin interface

Now that I created the models and tied them to the admin interface, I can start adding data using the admin interface.

Start the development server again:
```
$ python manage.py runserver
```
Go to http://127.0.0.1:8000/admin and log in.
Under the "Blog" heading, click "Posts", then add some new posts using "Add post" and the "Save" links. This will add data to the SQLite database.

Create a template

Now I will display the data I just created using a template and a view. The template holds all the HTML code and some simple Django template code which the view's Python code uses to customize the page.

Create the file myblogsite/templates/mytemplate.html and put the following inside:

<html>
 <head><title>Post</title></head>
 <body>
   <h1>My New Blog Version 0.0.1</h1>

   {% for post in post_list %}
   {{ post }}
   <hr />
   {% endfor %}

 </body>
</html>

Edit myblogsite/settings.py again to instruct Django where to find the template files.
```
TEMPLATE_DIRS = (
   '/home/sofeng/src/django/myblogsite/templates',
)
```
Be sure to include the comma at the end.

Create a view

The view is where I will grab the data from my model and insert it into my template.

Create a new file myblogsite/myblogapp/views.py and put the following inside:
```
from django.shortcuts import render_to_response
from myblogsite.myblogapp.models import Post

def myview(request):
   posts = Post.objects.all()
   post_body_list = [post.body for post in posts]
   return render_to_response('mytemplate.html',
                             {'post_list': post_body_list})
```
Correction 7/6/2008: I previously had from myblogapp.models import Post on the second line. This works, but is inconsistent with my urls.py below and can (and did for me) cause subtle errors in the future. I corrected the line to read: from myblogsite.myblogapp.models import Post.

Map an URL to the new view

Finally, I map an URL to my newly created view.

Edit myblogsite/urls.py so that it looks like:

from django.conf.urls.defaults import *
from myblogsite.myblogapp.views import myview

urlpatterns = patterns('',
   (r'^admin/', include('django.contrib.admin.urls')),
   (r'^myview/$', myview),
)

Take a look at the new page: Run the server:
```
$ python manage.py runserver
```
Then go to http://127.0.0.1:8000/myview Visiting the url shows all the posts I entered through the admin interface. Nice. Here is a snapshot screenshot of my new blog:

That's it for now. I tried to map out the basic steps for using Django's MTV development pattern. Hopefully, in the future, I'll be able to add more useful features to my new Django-powered blog.

How to use Python's enumerate and zip to iterate over two lists and their indices.

2008-04-18T15:22:00-07:00

enumerate - Iterate over indices and items of a list

The Python Cookbook (Recipe 4.4) describes how to iterate over items and indices in a list using enumerate. For example:

alist = ['a1', 'a2', 'a3']

for i, a in enumerate(alist):
    print i, a

Results:

0 a1
1 a2
2 a3

zip - Iterate over two lists in parallel

I previously wrote about using zip to iterate over two lists in parallel. Example:

alist = ['a1', 'a2', 'a3']
blist = ['b1', 'b2', 'b3']

for a, b in zip(alist, blist):
    print a, b

Results:

a1 b1
a2 b2
a3 b3

enumerate with zip

Here is how to iterate over two lists and their indices using enumerate together with zip:

alist = ['a1', 'a2', 'a3']
blist = ['b1', 'b2', 'b3']

for i, (a, b) in enumerate(zip(alist, blist)):
    print i, a, b

Results:

0 a1 b1
1 a2 b2
2 a3 b3

Recommended books

2008-04-17T19:13:00-07:00

I love having a subscription to Safari Books Online. Currently my company provides a free subscription, but if I get a new job, I might consider subscribing myself. Since I get to browse a number of books at no cost, I thought I'd note which books are my favorites. (Note, I am not being paid by Safari Books Online.)

General Software

Structure and Interpretation of Computer Programs, Second Edition, Harold Abelson and Gerald Jay Sussman, MIT Press, ?year?
I learned about this book through a job posting. It might bring you to tears if you get it. I'm only in the second chapter. It is used in an introductory Computer Science course at MIT. It uses Scheme (Lisp) to demonstrate concepts.
Available free online at: http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-4.html
A free video lecture series is also available.

C

The C Programming Language, Second Edition, Brian W. Kernighan and Dennis M. Ritchie, Prentice Hall, 1988
The definitive C book.

Python

Core Python Programming, Second Edition, Wesley J. Chun, Prentice Hall, September 18, 2006
Usually I like O'Reilly books best, but I slightly prefer Chun's text to Learning Python.
Available at Safari Books Online
The Django Book, Apress, December 2007
I think this is the first official Django book.

SQLite

The Definitive Guide to SQLite, Mike Owens, Apress, May 2006
I browsed a few SQL books but liked this one better than most. It has a good theory section.
Available at Apress.com

Linux or related

X Power Tools, Chris Tyler, O'Reilly, December 15, 2007
Lots of good information on the X Window System and more; easy to understand. I wish the basic Ubuntu or Linux books had some of this information.
Available at Safari Books Online
SSH, The Secure Shell: The Definitive Guide, Second Edition, Daniel J. Barrett; Richard E. Silverman; Robert G. Byrnes, O'Reilly, May 10, 2005
Available at Safari Books Online

Ruby

why's poignant guide to Ruby by why the lucky stiff
Only in chapter 3, but very funny.
Available free online at: http://poignantguide.net/ruby/

Javascript

Javascript: The Good Parts, Douglas Crockford, O'Reilly, May 2008
This is the first Javascript book I read as I switched to Javascript and Frontend development. I was debating whether to read it since it was so old but I found it to be good and recommend it.
You Don't Know JS: ES6 & Beyond, Kyle Simpson, O'Reilly, December 2015
This is the primary way I learned ES6. It is very detail oriented.
Available free online on github
JavaScript Allongé, the "Six" Edition, Reg "raganwald" Braithwaite, Leanpub, 2016
This was the third Javascript book I read and it was the most fun. It teaches functional programming concepts such as closures, shadowing, higher order functions, combinators, decorators, etc using ES6.
Available free online at: https://leanpub.com/javascriptallongesix/read

Non technical

The Hitchhiker's Guide to the Galaxy by Douglas Adams
Calvin and Hobbes by Bill Watterson
Crime and Punishment by Fyodor Dostoyevsky
Mere Christianity by C. S. Lewis
Screwtape Letters by C. S. Lewis

Working with files and directories in Python

2008-04-14T14:52:00-07:00

I often have a difficult time finding the appropriate functions to work with files and directories in Python. I think one reason is because the Library Reference puts seemingly related functions in two different places: 11 File and Directory Access and 14.1.4 Files and Directories.

Section 11, File and Directory Access contains documentation for useful functions such as os.path.exists which checks if a path exists, glob.glob which is useful for matching filenames using the Unix-style * and ? wildcards, and shutil.copy which is similary to the Unix cp command.

This section includes a total of 11 subsections for modules related to file and directory access, but it does not contain basic commands such as os.chdir, os.listdir, or os.rename. These functions are documented in Section 14.1.4, Files and Directories, instead.

Here is a summary of some of the functions I find useful. Check the documentation for more complete and detailed information.

11 File and Directory Access

os.path module:

exists - checks if a path or file exists
Example:
```
import os.path
print os.path.exists("c:/Windows")
```
Results:
```
True
```
isfile and isdir - test if the path is a file or directory, respectively.
Example:
```
import os.path
print os.path.isfile("c:/Windows")
print os.path.isdir("c:/Windows")
```
Results:
```
False
True
```

getmtime - returns the modification time of a path
Example:

import os.path
import time
mtime = os.path.getmtime("c:/Windows")
print time.gmtime(mtime)

Results:

(2008, 4, 2, 15, 58, 39, 2, 93, 0)

glob module:

glob: returns a list of paths matching a Unix-style glob pattern.
Example:

import glob
print glob.glob("c:/windows/*.bmp")

Results:

['c:/windows\\Blue Lace 16.bmp', 'c:/windows\\Coffee Bean.bmp', 'c:/windows\\default.bmp', 'c:/windows\\FeatherTexture.bmp', 'c:/windows\\Gone Fishing.bmp', 'c:/windows\\Greenstone.bmp', 'c:/windows\\Prairie Wind.bmp', 'c:/windows\\Rhododendron.bmp', 'c:/windows\\River Sumida.bmp', 'c:/windows\\Santa Fe Stucco.bmp', 'c:/windows\\Soap Bubbles.bmp', 'c:/windows\\winnt.bmp', 'c:/windows\\winnt256.bmp', 'c:/windows\\Zapotec.bmp']

shutil module:
- copy - similar to Unix cp
- copy2 - similar to Unix cp -p
- copytree - similar to Unix cp -r
- rmtree - similar to Unix rm -r

14 Generic Operating System Services - > 14.1 os --- Miscellaneous operating system interfaces -> 14.1.4 Files and Directories

chdir - change the current working directory
getcwd - return a string representing the current working directory
listdir - return a list of the names of the entries in the directory.
makedir - create a directory
remove - remove a file (this is identical to unlink)
rename - rename the file or directory
walk - walks a directory tree (see my os.walk example)

PyQt: How to pass arguments while emitting a signal

2008-01-29T17:33:00-08:00

I often forget how to do this so I'm documenting it here for future reference. If I want to emit a signal and also pass an argument with that signal, I can use the form self.emit(SIGNAL("mySignalName"), myarg). I connect the signal to a method in the usual way. To use the argument, I merely need to specify the argument in the method definition. What often confuses me is that I don't need to specify arguments in the connect statement. The example below emits a signal didSomething and passes two arguments, "important" and "information" to the update_label method.

import sys
import time
from PyQt4.QtCore import * 
from PyQt4.QtGui import * 

#################################################################### 
class MyWindow(QWidget): 
    def __init__(self, *args): 
        QWidget.__init__(self, *args)

        self.label = QLabel(" ")
        layout = QVBoxLayout()
        layout.addWidget(self.label)
        self.setLayout(layout)
        self.connect(self, SIGNAL("didSomething"),
                     self.update_label)
        self.do_something()

    def do_something(self):
        self.emit(SIGNAL("didSomething"), "important", "information")

    def update_label(self, value1, value2):
        self.label.setText(value1 + " " + value2)

####################################################################
if __name__ == "__main__": 
    app = QApplication(sys.argv) 
    w = MyWindow() 
    w.show() 
    sys.exit(app.exec_())

PyQt4 QItemDelegate example with QListView and QAbstractListModel

2008-01-23T18:13:00-08:00

I am currently working on a mini project which uses a QListView to display items in a list box. I am happy with most of the default behavior in the list view, however, I want to change how the highlighting of selected items is displayed. Currently, in my Windows environment, selecting an item in the list highlights the item in dark blue. This is fine, however, when the list box loses focus, the highlight color turns to a light gray, which is hard for me to see. I would like the selection to have a red highlight, whether the widget has focus or not.

My solution is to add a custom delegate to my list view. Normally, a standard view uses a default delegate (QItemDelegate) to render and edit the model's data. To customize the way the data is displayed in the view, I subclass QItemDelegate and implement a custom paint() method to set the background color to red for selected items. (Note, it is possible to specify certain formatting (including background color) using ItemDataRoles in the QAbstractListModel subclass, however, using a custom delegate is more powerful, and I didn't want to mix appearance-related code with my data model.)

In the example below, I started with the simple QListView / QAbstractListModel example, and added MyDelegate, a subclass of QItemDelegate. This class reimplements the paint() method to highlight selected items in red.

import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *

####################################################################
def main():
    app = QApplication(sys.argv)
    w = MyWindow()
    w.show()
    sys.exit(app.exec_())

####################################################################
class MyWindow(QWidget):
    def __init__(self, *args):
        QWidget.__init__(self, *args)

        # create objects
        list_data = [1,2,3,4]
        lm = MyListModel(list_data, self)
        de = MyDelegate(self)
        lv = QListView()
        lv.setModel(lm)
        lv.setItemDelegate(de)

        # layout
        layout = QVBoxLayout()
        layout.addWidget(lv)
        self.setLayout(layout)

####################################################################
class MyDelegate(QItemDelegate):
    def __init__(self, parent=None, *args):
        QItemDelegate.__init__(self, parent, *args)

    def paint(self, painter, option, index):
        painter.save()

        # set background color
        painter.setPen(QPen(Qt.NoPen))
        if option.state & QStyle.State_Selected:
            painter.setBrush(QBrush(Qt.red))
        else:
            painter.setBrush(QBrush(Qt.white))
        painter.drawRect(option.rect)

        # set text color
        painter.setPen(QPen(Qt.black))
        value = index.data(Qt.DisplayRole)
        if value.isValid():
            text = value.toString()
            painter.drawText(option.rect, Qt.AlignLeft, text)

        painter.restore()

####################################################################
class MyListModel(QAbstractListModel):
    def __init__(self, datain, parent=None, *args):
        """ datain: a list where each item is a row
        """
        QAbstractTableModel.__init__(self, parent, *args)
        self.listdata = datain

    def rowCount(self, parent=QModelIndex()):
        return len(self.listdata)

    def data(self, index, role):
        if index.isValid() and role == Qt.DisplayRole:
            return QVariant(self.listdata[index.row()])
        else:
            return QVariant()

####################################################################
if __name__ == "__main__":
    main()

How to invert a dict in Python

2008-01-14T13:34:00-08:00

Example 1: If the values in the dictionary are unique and hashable, then I can use Recipe 4.14 in the Python Cookbook, 2nd Edition.

def invert_dict(d):
    return dict([(v, k) for k, v in d.iteritems()])

d = {'child1': 'parent1',
     'child2': 'parent2',
     }
print invert_dict(d)

{'parent2': 'child2', 'parent1': 'child1'}

Example 2: If the values in the dictionary are hashable, but not unique, I can create a dict of lists as an inverse.

def invert_dict_nonunique(d):
    newdict = {}
    for k, v in d.iteritems():
        newdict.setdefault(v, []).append(k)
    return newdict

d = {'child1': 'parent1',
     'child2': 'parent1',
     'child3': 'parent2',
     'child4': 'parent2',
     }
print invert_dict_nonunique(d)

{'parent2': ['child3', 'child4'], 'parent1': ['child1', 'child2']}

Example 3: If I am starting with a dict of lists, where lists contain unique hashable items, I can create an inverse as shown below.

def invert_dol(d):
    return dict((v, k) for k in d for v in d[k])

d = {'child1': ['parent1'],
     'child2': ['parent2', 'parent3'],
     }
print invert_dol(d)

{'parent3': 'child2', 'parent2': 'child2', 'parent1': 'child1'}

Example 4: If I am starting with a dict of lists, where lists contain non-unique hashable items, I can create another dict of lists as an inverse.

def invert_dol_nonunique(d):
    newdict = {}
    for k in d:
        for v in d[k]:
            newdict.setdefault(v, []).append(k)
    return newdict

d = {'child1': ['parent1'],
     'child2': ['parent1'],
     'child3': ['parent2'],
     'child4': ['parent2'],
     'child5': ['parent1', 'parent2'],
     }
print invert_dol_nonunique(d)

{'parent2': ['child3', 'child4', 'child5'], 'parent1': ['child1', 'child2', 'child5']}

Notes on Python variable scope

2008-01-09T17:06:00-08:00

Example 1: The difference between global and local variables

Global variables are accessible inside and outside of functions. Local variables are only accessible inside the function. In the example below, the function can access both the global and the local variable. However, trying to access the local variable outside the function produces an error.

global_var = 'foo'
def ex1():
    local_var = 'bar'
    print global_var
    print local_var

ex1()
print global_var
print local_var  # this gives an error

foo
bar
foo
Traceback (most recent call last):
  File "nested_scope.py", line 12, in 
    print local_var  # this gives an error
NameError: name 'local_var' is not defined

Example 2: How not to set a global variable

*Setting* a global variable from within a function is not as simple. If I set a variable in a function with the same name as a global variable, I am actually creating a new local variable. In the example below, var remains 'foo' even after the function is called.

var = 'foo'
def ex2():
    var = 'bar'
    print 'inside the function var is ', var

ex2()
print 'outside the function var is ', var

inside the function var is  bar
outside the function var is  foo

Example 3: How to set a global variable

To set the global variable inside a function, I need to use the global statement. This declares the inner variable to have module scope. Now var remains 'bar' after the function is called.

var = 'foo'
def ex3():
    global var
    var = 'bar'
    print 'inside the function var is ', var

ex3()
print 'outside the function var is ', var

inside the function var is  bar
outside the function var is  bar

Example 4: Nested functions

Scoping for nested functions works similarly. In the example below, the inner function can access both var_outer and var_inner. However, the outer function cannot access var_inner. Side note: the inner function is considered a closure if it makes reference to a non-global outside variable.

def ex4():
    var_outer = 'foo'
    def inner():
        var_inner = 'bar'
        print var_outer
        print var_inner
    inner()
    print var_outer
    print var_inner # this gives an error

ex4()

foo
bar
foo
Traceback (most recent call last):
  File "nested_scope.py", line 53, in 
    ex3()
  File "nested_scope.py", line 51, in ex3
    print var_inner # this gives an error
NameError: global name 'var_inner' is not defined

Example 5: How not to set an outer variable

Like Example 2, setting a variable in the inner function creates a new local variable instead of modifying the outer variable. In the example below, var in the outer function does not get changed to 'bar'.

def ex5():
    var = 'foo'
    def inner():
        var = 'bar'
        print 'inside inner, var is ', var
    inner()
    print 'inside outer function, var is ', var

ex5()

inside inner, var is  bar
inside outer function, var is  foo

Example 6: Another way to not set an outer variable

However, using the global keyword won't work in this case. global cause a variable to have module scope, but I want my variable to have the scope of the outer function. Per the Python 3000 Status Update, Python 3000 will have a nonlocal keyword to solve this problem. See PEP 3104 for more information about nonlocal and nested scopes. In the example below, var is still not changed to 'bar' in the outer function.

def ex6():
    var = 'foo'
    def inner():
        global var
        var = 'bar'
        print 'inside inner, var is ', var
    inner()
    print 'inside outer function, var is ', var

ex6()

inside inner, var is  bar
inside outer function, var is  foo

Example 7: A workaround until Python 3000 arrives

A workaround is to create an empty class to use as an additional namespace. Now the variable in the outer function can be set to 'bar'.

class Namespace: pass
def ex7():
    ns = Namespace()
    ns.var = 'foo'
    def inner():
        ns.var = 'bar'
        print 'inside inner, ns.var is ', ns.var
    inner()
    print 'inside outer function, ns.var is ', ns.var
ex7()

inside inner, ns.var is  bar
inside outer function, ns.var is  bar

Example 8: Alternative to Example 7

Update 2010-03-01: According to Alexander's comment below, this is not a good way to do things.

I learned about this method from Nihiliad's comment on my recursion example. To me, this seems like a more elegant alternative to the solution in Example 7.

def ex8():
    ex8.var = 'foo'
    def inner():
        ex8.var = 'bar'
        print 'inside inner, ex8.var is ', ex8.var
    inner()
    print 'inside outer function, ex8.var is ', ex8.var
ex8()

inside inner, ex8.var is  bar
inside outer function, ex8.var is  bar

Reference

Core Python Programming, Second Edition, Ch 11

Saving a Python dict to a file using pickle

2008-01-07T18:00:00-08:00

Per Programming Python, 3rd Edition, there are a number of methods to store persistent data with Python:

I often use flat files to read or write text (string) data using the os library.
Flat files are read sequentially, but dbm files allow for keyed access to string data
The pickle module can be used to store non-string Python data structures, such as Python dicts. However, the data is not keyed as with dbm files.
shelve files combine the best of the dbm and pickle methods by storing pickled objects in dbm keyed files.
I've read good things about the ZODB object-oriented database , but I don't know too much about it. Per the book, it is a more powerful alternative to shelves.
The final option is interfacing with a full-fledged SQL relational databases. As I mentioned before, Python 2.5 has an interface to SQLite as part of the standard distribution.

Here is an example using pickle which writes a Python dict to a file and reads it back again:

import pickle

# write python dict to a file
mydict = {'a': 1, 'b': 2, 'c': 3}
output = open('myfile.pkl', 'wb')
pickle.dump(mydict, output)
output.close()

# read python dict back from the file
pkl_file = open('myfile.pkl', 'rb')
mydict2 = pickle.load(pkl_file)
pkl_file.close()

print mydict
print mydict2

Results:

{'a': 1, 'c': 3, 'b': 2}
{'a': 1, 'c': 3, 'b': 2}

Python PyQt Tab Completion example

2008-01-04T12:18:00-08:00

Here is an example Python GUI that implements tab completion. It uses the open source Qt 4.3 toolkit and PyQt 4.3 Python bindings.

A list of words is presented in a list box. As the user types, the list is shortened to show possible matches. If the user presses TAB, the input text is "completed" to the longest possible string match. This may be a whole word or a common substring of multiple words.

This example consists of two basic elements:

MyLineEdit is a subclass of the QLineEdit class. It is used as an input box to enter text. I needed to subclass QLineEdit because I needed to capture the TAB key press event for tab completion. (See this previous post.)
QListView and MyListModel implement a list with a simple model/view architechture. MyListModel is a subclass of QAbstractListModel. I implemented the required rowCount and data methods as well as a method called setAllData which replaces the entire existing data with a new list of data.

This example makes use of two SIGNALs:

The textChanged signal is emitted each time the user types a letter inside the QLineEdit box. It is connected to the text_changed method which updates the list of words in the QListView. MyListModel's setAllData method is used to update the data.
The tabPressed signal is a custom signal I added to my QLineEdit subclass. It is emitted each time the user presses the TAB key. This signal is connected the tab_pressed method which completes the input to the longest matching substring of the available words.

import sys
from PyQt4.QtCore import * 
from PyQt4.QtGui import * 

LIST_DATA = ['a', 'aardvark', 'aardvarks', 'aardwolf', 'aardwolves',
             'abacus', 'babel', 'bach', 'cache', 
             'daggle', 'facet', 'kabob', 'kansas']

#################################################################### 
def main(): 
    app = QApplication(sys.argv) 
    w = MyWindow() 
    w.show() 
    sys.exit(app.exec_()) 

#################################################################### 
class MyWindow(QWidget): 
    def __init__(self, *args): 
        QWidget.__init__(self, *args) 

        # create objects
        self.la = QLabel("Start typing to match items in list:")
        self.le = MyLineEdit()
        self.lm = MyListModel(LIST_DATA, self)
        self.lv = QListView()
        self.lv.setModel(self.lm)

        # layout
        layout = QVBoxLayout()
        layout.addWidget(self.la)
        layout.addWidget(self.le)
        layout.addWidget(self.lv) 
        self.setLayout(layout)

        # connections
        self.connect(self.le, SIGNAL("textChanged(QString)"),
                     self.text_changed)
        self.connect(self.le, SIGNAL("tabPressed"),
                     self.tab_pressed)

    def text_changed(self):
        """ updates the list of possible completions each time a key is 
            pressed """
        pattern = str(self.le.text())
        self.new_list = [item for item in LIST_DATA if item.find(pattern) == 0]
        self.lm.setAllData(self.new_list)

    def tab_pressed(self):
        """ completes the word to the longest matching string 
            when the tab key is pressed """

        # only one item in the completion list
        if len(self.new_list) == 1:
            newtext = self.new_list[0] + " "
            self.le.setText(newtext)

        # more than one remaining matches
        elif len(self.new_list) > 1:
            match = self.new_list.pop(0)
            for word in self.new_list:
                match = string_intersect(word, match)
            self.le.setText(match)

####################################################################
class MyLineEdit(QLineEdit):
    def __init__(self, *args):
        QLineEdit.__init__(self, *args)
        
    def event(self, event):
        if (event.type()==QEvent.KeyPress) and (event.key()==Qt.Key_Tab):
            self.emit(SIGNAL("tabPressed"))
            return True
        return QLineEdit.event(self, event)

#################################################################### 
class MyListModel(QAbstractListModel): 
    def __init__(self, datain, parent=None, *args): 
        """ datain: a list where each item is a row
        """
        QAbstractTableModel.__init__(self, parent, *args) 
        self.listdata = datain
 
    def rowCount(self, parent=QModelIndex()): 
        return len(self.listdata) 
 
    def data(self, index, role): 
        if index.isValid() and role == Qt.DisplayRole:
            return QVariant(self.listdata[index.row()])
        else: 
            return QVariant()

    def setAllData(self, newdata):
        """ replace all data with new data """
        self.listdata = newdata
        self.reset()

####################################################################
def string_intersect(str1, str2):
    newlist = []
    for i,j in zip(str1, str2):
        if i == j:
            newlist.append(i)
        else:
            break
    return ''.join(newlist)

####################################################################
if __name__ == "__main__": 
    main()

How to use *args and **kwargs in Python

2008-01-03T17:28:00-08:00

Or, How to use variable length argument lists in Python.

The special syntax, *args and **kwargs in function definitions is used to pass a variable number of arguments to a function. The single asterisk form (*args) is used to pass a non-keyworded, variable-length argument list, and the double asterisk form is used to pass a keyworded, variable-length argument list. Here is an example of how to use the non-keyworded form. This example passes one formal (positional) argument, and two more variable length arguments.

def test_var_args(farg, *args):
    print "formal arg:", farg
    for arg in args:
        print "another arg:", arg

test_var_args(1, "two", 3)

Results:

formal arg: 1
another arg: two
another arg: 3

Here is an example of how to use the keyworded form. Again, one formal argument and two keyworded variable arguments are passed.

def test_var_kwargs(farg, **kwargs):
    print "formal arg:", farg
    for key in kwargs:
        print "another keyword arg: %s: %s" % (key, kwargs[key])

test_var_kwargs(farg=1, myarg2="two", myarg3=3)

Results:

formal arg: 1
another keyword arg: myarg2: two
another keyword arg: myarg3: 3

Using `*args` and `**kwargs` when calling a function

This special syntax can be used, not only in function definitions, but also when calling a function.

def test_var_args_call(arg1, arg2, arg3):
    print "arg1:", arg1
    print "arg2:", arg2
    print "arg3:", arg3

args = ("two", 3)
test_var_args_call(1, *args)

Results:

arg1: 1
arg2: two
arg3: 3

Here is an example using the keyworded form when calling a function:

def test_var_args_call(arg1, arg2, arg3):
    print "arg1:", arg1
    print "arg2:", arg2
    print "arg3:", arg3

kwargs = {"arg3": 3, "arg2": "two"}
test_var_args_call(1, **kwargs)

Results:

arg1: 1
arg2: two
arg3: 3

How to find the intersection and union of two lists in Python

2008-01-03T16:33:00-08:00

My friend Bill had previously alerted me to the coolness of Python sets. However I hadn't found opportunity to use them until now. Here are three functions using sets to remove duplicate entries from a list, find the intersection of two lists, and find the union of two lists. Note, sets were introduced in Python 2.4, so Python 2.4 or later is required. Also, the items in the list must be hashable and order of the lists is not preserved.

For more information on Python sets, see the Library Reference.

""" NOTES:
      - requires Python 2.4 or greater
      - elements of the lists must be hashable
      - order of the original lists is not preserved
"""
def unique(a):
    """ return the list with duplicate elements removed """
    return list(set(a))

def intersect(a, b):
    """ return the intersection of two lists """
    return list(set(a) & set(b))

def union(a, b):
    """ return the union of two lists """
    return list(set(a) | set(b))

if __name__ == "__main__": 
    a = [0,1,2,0,1,2,3,4,5,6,7,8,9]
    b = [5,6,7,8,9,10,11,12,13,14]
    print unique(a)
    print intersect(a, b)
    print union(a, b)

Results:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[8, 9, 5, 6, 7]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

How to capture the Tab key press event with PyQt 4.3

2008-01-03T15:26:00-08:00

Normally, pressing the TAB key changes focus among widgets. However, I would like to use the TAB key for other purposes (e.g. tab completion). To gain control of the TAB key press event, I need to subclass my widget and reimplement the QObject.event() event handler. I don't need to re-write the entire event handler. I only need to process TAB key press events. I will pass all other events to the default event handler. The example below subclasses the QLineEdit widget and reimplements the event() method. Pressing the TAB key inside this new widget prints out the text "tab pressed" inside a second QLineEdit box.

The Events and Event Filters Trolltech QT documentation has a good explanation of how this works. My example shows how to use Python and PyQt instead of C++.

import sys
from PyQt4.QtCore import * 
from PyQt4.QtGui import * 

#################################################################### 
def main(): 
    app = QApplication(sys.argv) 
    w = MyWindow() 
    w.show() 
    sys.exit(app.exec_()) 

####################################################################
class MyWindow(QWidget): 
    def __init__(self, *args): 
        QWidget.__init__(self, *args)

        # create objects
        self.la = QLabel("Press tab in this box:")
        self.le = MyLineEdit()
        self.la2 = QLabel("\nLook here:")
        self.le2 = QLineEdit()

        # layout
        layout = QVBoxLayout()
        layout.addWidget(self.la)
        layout.addWidget(self.le)
        layout.addWidget(self.la2)
        layout.addWidget(self.le2)
        self.setLayout(layout)

        # connections
        self.connect(self.le, SIGNAL("tabPressed"),
                     self.update)

    def update(self):
        newtext = str(self.le2.text()) + "tab pressed "
        self.le2.setText(newtext)

####################################################################
class MyLineEdit(QLineEdit):
    def __init__(self, *args):
        QLineEdit.__init__(self, *args)
        
    def event(self, event):
        if (event.type()==QEvent.KeyPress) and (event.key()==Qt.Key_Tab):
            self.emit(SIGNAL("tabPressed"))
            return True

        return QLineEdit.event(self, event)

####################################################################
if __name__ == "__main__": 
    main()

PyQt 4.3 Simple QAbstractListModel/ QlistView example

2008-01-03T12:50:00-08:00

import sys
from PyQt4.QtCore import * 
from PyQt4.QtGui import * 

#################################################################### 
def main(): 
    app = QApplication(sys.argv) 
    w = MyWindow() 
    w.show() 
    sys.exit(app.exec_()) 

#################################################################### 
class MyWindow(QWidget): 
    def __init__(self, *args): 
        QWidget.__init__(self, *args) 

        # create table
        list_data = [1,2,3,4]
        lm = MyListModel(list_data, self)
        lv = QListView()
        lv.setModel(lm)

        # layout
        layout = QVBoxLayout()
        layout.addWidget(lv) 
        self.setLayout(layout)

#################################################################### 
class MyListModel(QAbstractListModel): 
    def __init__(self, datain, parent=None, *args): 
        """ datain: a list where each item is a row
        """
        QAbstractListModel.__init__(self, parent, *args) 
        self.listdata = datain
 
    def rowCount(self, parent=QModelIndex()): 
        return len(self.listdata) 
 
    def data(self, index, role): 
        if index.isValid() and role == Qt.DisplayRole:
            return QVariant(self.listdata[index.row()])
        else: 
            return QVariant()

####################################################################
if __name__ == "__main__": 
    main()

How to pass command line arguments to your Python program

2007-12-20T17:35:00-08:00

Here is an example for quick reference. argv holds the program name at index 0. That's why we start at 1.

#!/usr/bin/python

import sys

def main():
    # print command line arguments
    for arg in sys.argv[1:]:
        print arg

if __name__ == "__main__":
    main()

Try it out:

$ python cmdline_args.py arg1 arg2 arg3
arg1
arg2
arg3

Tabular data structure conversion in Python

2007-12-20T13:22:00-08:00

Here is a Python library to convert between various tabular data structures including list of lists, list of dicts, dict of lists, and dict of dicts. My original attempts at these conversions required that the data be rectangular (e.g. each column has the same number of elements). However, further research led me to this ASPN Recipe which uses map to transpose a list of lists even if it is not rectangular. With help from the mailing list, I rewrote the recipe without using lambda. (I did this because Guido suggested not to use map with lambda for the sake of clarity.)

I used list comprehensions wherever possible and a functional/ declarative approach in general. It is likely there is a better way to do many of these conversions. (After all, I just learned how to use zip().) In particular, the functions with the comment "Better way?" use a number of the other conversion functions in series to achieve the desired result. All of these could be optimized. Feedback on better methods is welcome.

Example data structures

Here are examples of the 8 different tabular data structures. Note that if a transpose is performed (i.e. rows switched with columns or vice versa), the output is padded with None. Otherwise, it is left as is.

# lorl- list of lists where each inner list is a row
lorl = [
    ['a1', 'b1', 'c1'],    # row 1
    ['a2', 'b2', 'c2'],    # row 2
    ['a3', 'b3', 'c3'],    # row 3
    ['a4', 'b4',     ],    # row 4
    ]

# locl- list of lists where each inner list is a column
locl = [
    ['a1', 'a2', 'a3', 'a4'],    # col a
    ['b1', 'b2', 'b3', 'b4'],    # col b
    ['c1', 'c2', 'c3',     ],    # col c
    ]

# lord- list of dicts where each dict is a row
lord = [
    {'a':'a1', 'b':'b1', 'c':'c1'},   # row 1
    {'a':'a2', 'b':'b2', 'c':'c2'},   # row 2
    {'a':'a3', 'b':'b3', 'c':'c3'},   # row 3
    {'a':'a4', 'b':'b4',         },   # row 4
    ]

# locd- list of dicts where each dict is a column
locd = [
    {1:'a1', 2:'a2', 3:'a3', 4:'a4'},         # col a
    {1:'b1', 2:'b2', 3:'b3', 4:'b4'},         # col b
    {1:'c1', 2:'c2', 3:'c3',       },         # col c
    ]

# dorl- dict of lists where each list is a row
dorl = {
    1: ['a1', 'b1', 'c1'],            # row 1
    2: ['a2', 'b2', 'c2'],            # row 2
    3: ['a3', 'b3', 'c3'],            # row 3
    4: ['a4', 'b4',     ],            # row 4
    }
# docl- dict of lists where each list is a column
docl = {
    'a': ['a1', 'a2', 'a3', 'a4'],          # column a
    'b': ['b1', 'b2', 'b3', 'b4'],          # column b
    'c': ['c1', 'c2', 'c3',     ],          # column c
    }

# dord- dict of dicts where each inner dict is a row
dord = {
    1: {'a':'a1', 'b':'b1', 'c':'c1'},  # row 1
    2: {'a':'a2', 'b':'b2', 'c':'c2'},  # row 2
    3: {'a':'a3', 'b':'b3', 'c':'c3'},  # row 3
    4: {'a':'a4', 'b':'b4',         },  # row 4
    }

# docd- dict of dicts where each inner dict is a column
docd = {
    'a': {1:'a1', 2:'a2', 3:'a3', 4:'a4'},    # column a
    'b': {1:'b1', 2:'b2', 3:'b3', 4:'b4'},    # column b
    'c': {1:'c1', 2:'c2', 3:'c3',       },    # column c
    }

# list of row keys and column keys
rowkeys = [1, 2, 3, 4]
colkeys = ['a', 'b', 'c']

Code

Below is the library of functions.

"""tabular.py
Functions to convert tabular data structures
The following data structures are supported:
lorl- list of lists where each inner list is a row
locl- list of lists where each inner list is a column
lord- list of dicts where each dict is a row
locd- list of dicts where each dict is a column
dorl- dict of lists where each list is a row
docl- dict of lists where each list is a column
dord- dict of dicts where each inner dict is a row
docd- dict of dicts where each inner dict is a column
"""
#-------------------------------------------------------
# from lorl to ...
#-------------------------------------------------------
def lorl2locl(lorl):
    return [list(col) for col in map(None, *lorl)]

def lorl2lord(lorl, colkeys):
    return [dict(zip(colkeys, row)) for row in lorl]

def lorl2locd(lorl, rowkeys):
    # better way?
    return locl2locd(lorl2locl(lorl), rowkeys)

def lorl2dorl(lorl, rowkeys):
    return dict(zip(rowkeys, [row for row in lorl]))

def lorl2docl(lorl, colkeys):
    # better way?
    return locl2docl(lorl2locl(lorl), colkeys)

def lorl2dord(lorl, rowkeys, colkeys):
    return dict(zip(rowkeys, [dict(zip(colkeys, row))
                              for row in lorl]))

def lorl2docd(lorl, rowkeys, colkeys):
    # better way?
    return dict(zip(colkeys, [dict(zip(rowkeys, col))
                              for col in lorl2locl(lorl)]))

#-------------------------------------------------------
# from locl to ...
#-------------------------------------------------------
def locl2lorl(locl):
    return [list(row) for row in map(None, *locl)]

def locl2lord(locl, colkeys):
    # better way?
    return lorl2lord(locl2lorl(locl), colkeys)

def locl2locd(locl, rowkeys):
    return [dict(zip(rowkeys, col)) for col in locl]

def locl2dorl(locl, rowkeys):
    # better way?
    return dict(zip(rowkeys, [row for row in locl2lorl(locl)]))

def locl2docl(locl, colkeys):
    return dict(zip(colkeys, locl))

def locl2dord(locl, rowkeys, colkeys):
    # better way?
    return dict(zip(rowkeys, [dict(zip(colkeys, row))
                              for row in locl2lorl(locl)]))

def locl2docd(locl, rowkeys, colkeys):
    return dict(zip(colkeys, [dict(zip(rowkeys, col))
                              for col in locl]))

#-------------------------------------------------------
# from lord to ...
#-------------------------------------------------------
def lord2lorl(lord, colkeys):
    return [[row[key] for key in colkeys if key in row]
            for row in lord]

def lord2locl(lord, colkeys):
    # better way?
    return lorl2locl(lord2lorl(lord, colkeys))

def lord2locd(lord, rowkeys, colkeys):
    return [dict([(rkey, row[ckey])
                  for rkey, row in zip(rowkeys, lord) if ckey in row])
            for ckey in colkeys]

def lord2dorl(lord, rowkeys, colkeys):
    return dict(zip(rowkeys, [[row[ckey]
                               for ckey in colkeys if ckey in row]
                              for row in lord]))

def lord2docl(lord, colkeys):
    return dict(zip(colkeys, [[row[ckey]
                               for row in lord if ckey in row]
                              for ckey in colkeys]))

def lord2dord(lord, rowkeys):
    return dict(zip(rowkeys, lord))

def lord2docd(lord, rowkeys, colkeys):
    return dict(zip(colkeys,
                    [dict(zip(rowkeys,
                              [row[ckey]
                               for row in lord if ckey in row]))
                     for ckey in colkeys]))

#-------------------------------------------------------
# from locd to ...
#-------------------------------------------------------
def locd2lorl(locd, rowkeys):
    # better way?
    return locl2lorl(locd2locl(locd, rowkeys))

def locd2locl(locd, rowkeys):
    return [[col[key] for key in rowkeys if key in col]
            for col in locd]

def locd2lord(locd, rowkeys, colkeys):
    return [dict([(ckey, col[rkey])
                  for ckey, col in zip(colkeys, locd) if rkey in col])
            for rkey in rowkeys]

def locd2dorl(locd, rowkeys):
    return dict(zip(rowkeys, [[col[rkey]
                               for col in locd if rkey in col]
                              for rkey in rowkeys]))

def locd2docl(locd, rowkeys, colkeys):
    return dict(zip(colkeys, [[col[rkey]
                               for rkey in rowkeys if rkey in col]
                              for col in locd]))

def locd2dord(locd, rowkeys, colkeys):
    return dict(zip(rowkeys,
                    [dict(zip(colkeys,
                              [col[rkey]
                               for col in locd if rkey in col]))
                     for rkey in rowkeys]))

def locd2docd(locd, colkeys):
    return dict(zip(colkeys, locd))

#-------------------------------------------------------
# from dorl to ...
#-------------------------------------------------------
def dorl2lorl(dorl, rowkeys):
    return [dorl[key] for key in rowkeys]

def dorl2locl(dorl, rowkeys):
    # better way?
    return lorl2locl(dorl2lorl(dorl, rowkeys))

def dorl2lord(dorl, rowkeys, colkeys):
    return [dict(zip(colkeys, dorl[rkey]))
            for rkey in rowkeys]

def dorl2locd(dorl, rowkeys):
    # better way?
    return locl2locd(lorl2locl(dorl2lorl(dorl, rowkeys)), rowkeys)

def dorl2docl(dorl, rowkeys, colkeys):
    # better way?
    return locl2docl(lorl2locl(dorl2lorl(dorl, rowkeys)), colkeys)

def dorl2dord(dorl, rowkeys, colkeys):
    # better way?
    return lorl2dord(dorl2lorl(dorl, rowkeys), rowkeys, colkeys)

def dorl2docd(dorl, rowkeys, colkeys):
    # better way?
    return locl2docd(lorl2locl(dorl2lorl(dorl, rowkeys)),
                     rowkeys, colkeys)

#-------------------------------------------------------
# from docl to ...
#-------------------------------------------------------
def docl2lorl(docl, colkeys):
    # better way?
    return locl2lorl(docl2locl(docl, colkeys))

def docl2locl(docl, colkeys):
    return [docl[key] for key in colkeys]

def docl2lord(docl, rowkeys, colkeys):
    # better way?
    return lorl2lord(locl2lorl(docl2locl(docl, colkeys)), colkeys)

def docl2locd(docl, rowkeys, colkeys):
    #
    return [dict(zip(rowkeys, docl[ckey]))
            for ckey in colkeys]

def docl2dorl(docl, rowkeys, colkeys):
    # better way?
    return lorl2dorl(locl2lorl(docl2locl(docl, colkeys)), rowkeys)

def docl2dord(docl, rowkeys, colkeys):
    # better way?
    return lorl2dord(locl2lorl(docl2locl(docl, colkeys)),
                     rowkeys, colkeys)

def docl2docd(docl, rowkeys, colkeys):
    # better way?
    return locl2docd(docl2locl(docl, colkeys), rowkeys, colkeys)

#-------------------------------------------------------
# from dord to ...
#-------------------------------------------------------
def dord2lorl(dord, rowkeys, colkeys):
    return [[dord[rkey][ckey]
             for ckey in colkeys if ckey in dord[rkey]]
            for rkey in rowkeys if rkey in dord]

def dord2locl(dord, rowkeys, colkeys):
    # better way?
    return lorl2locl(dord2lorl(dord, rowkeys, colkeys))

def dord2lord(dord, rowkeys):
    return [dord[rkey] for rkey in rowkeys]

def dord2locd(dord, rowkeys, colkeys):
    # better way?
    return lord2locd(dord2lord(dord, rowkeys), rowkeys, colkeys)

def dord2dorl(dord, rowkeys, colkeys):
    # don't need zip
    return dict([(rkey, [dord[rkey][ckey]
                         for ckey in colkeys if ckey in dord[rkey]])
                 for rkey in rowkeys])

def dord2docl(dord, rowkeys, colkeys):
    # better way?
    return locl2docl(lorl2locl(dord2lorl(dord, rowkeys, colkeys)),
                     colkeys)

def dord2docd(dord, rowkeys, colkeys):
    # better way?
    return locl2docd(lorl2locl(dord2lorl(dord, rowkeys, colkeys)),
                     rowkeys, colkeys)

#-------------------------------------------------------
# from docd to ...
#-------------------------------------------------------
def docd2lorl(docd, rowkeys, colkeys):
    # better way?
    return locl2lorl(docd2locl(docd, rowkeys, colkeys))

def docd2locl(docd, rowkeys, colkeys):
    return [[docd[ckey][rkey]
             for rkey in rowkeys if rkey in docd[ckey]]
            for ckey in colkeys if ckey in docd]

def docd2lord(docd, rowkeys, colkeys):
    # better way?
    return locd2lord(docd2locd(docd, colkeys), rowkeys, colkeys)

def docd2locd(docd, colkeys):
    return [docd[ckey] for ckey in colkeys]

def docd2dorl(docd, rowkeys, colkeys):
    # better way?
    return lorl2dorl(locl2lorl(docd2locl(docd, rowkeys, colkeys)),
                     rowkeys)

def docd2docl(docd, rowkeys, colkeys):
    # don't need zip
    return dict([(ckey, [docd[ckey][rkey]
                         for rkey in rowkeys if rkey in docd[ckey]])
                 for ckey in colkeys])

def docd2dord(docd, rowkeys, colkeys):
    # better way?
    return lorl2dord(locl2lorl(docd2locl(docd, rowkeys, colkeys)),
                     rowkeys, colkeys)

Iterating through two lists in parallel using zip()

2007-12-19T17:33:00-08:00

From the Python docs, zip returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. This is useful for iterating over two lists in parallel. For example, if I have two lists, I can get the first element of both lists, then the second element of both lists, then the third, etc.

Python 2.5.1 (r251:54863, May 18 2007, 16:56:43)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = [1,2,3]
>>> b = ['a','b','c']
>>> for i,j in zip(a,b):
...     print i, j
...
1 a
2 b
3 c
>>>

If the lists are different lengths, zip truncates to the length of the shortest list. Using map with None is similar to zip except the results are padded with None.

>>> a = [1,2,3]
>>> b = ['a','b','c','d']
>>> zip(a,b)
[(1, 'a'), (2, 'b'), (3, 'c')]
>>> map(None,a,b)
[(1, 'a'), (2, 'b'), (3, 'c'), (None, 'd')]
>>>

If I have a list of keys and a list of values, I can create a dictionary by passing the output of zip to dict.

>>> mykeys = ['a', 'b', 'c']
>>> myvalues = [1, 2, 3]
>>> dict(zip(mykeys, myvalues))
{'a': 1, 'c': 3, 'b': 2}
>>>

See also this thread on the Python mailing list: Iterating through two lists

How to convert a dictionary of lists to a list of lists in Python

2007-12-10T14:09:00-08:00

UPDATE: See my new post, Tabular data structure conversion in Python for an updated method which handles non-rectangular data.

The functions below convert a rectangular dictionary of lists to a list of lists. Each list in the dictionary must be the same length. Additionally, a list of keys is required as an input argument to specify the desired ordering of the columns in the returned list of lists. If this were not specified, the order of the columns would be unknown since items in a dictionary are unordered.

The converted list of lists can contain either a list of rows or a list of columns. The first two functions create a lists of rows; the last two create a list of columns. (I consider each list in the dict of lists as a column, and all items for a given index a row.)

I also compare the imperative/ procedural approach to the declarative/ functional approach. I like the declarative/functional approach because it is so concise, and, I believe, a little faster as well.

#!/usr/bin/python

# IMPERATIVE/PROCEDURAL APPROACH
def byrow_imper(dol, keylist):
    """Converts a dictionary of lists to a list of lists using the
    values of the dictionaries. Each list must be the same length.
       dol: dictionary of lists
       keylist: list of keys, ordered as desired
       Returns: a list of lists where the inner lists are rows. 
         i.e. returns a list of rows. """
    lol = []
    for i in xrange(len(dol[keylist[0]])):
        row = []
        for key in keylist:
            row.append(dol[key][i])
        lol.append(row)
    return lol

# DECLARATIVE/FUNCTIONAL APPROACH
def byrow_decl(dol, keylist):
    """Converts a dictionary of lists to a list of lists using the
    values of the dictionaries. Each list must be the same length.
       dol: dictionary of lists
       keylist: list of keys, ordered as desired
       Returns: a list of lists where the inner lists are rows. 
         i.e. returns a list of rows. """
    return [[dol[key][i] for key in keylist] 
            for i in xrange(len(dol[keylist[0]]))]

# IMPERATIVE/PROCEDURAL APPROACH
def bycol_imper(dol, keylist):
    """Converts a dictionary of lists to a list of lists using the
    values of the dictionaries. Each list must be the same length.
       dol: dictionary of lists
       keylist: list of keys, ordered as desired
       Returns: a list of lists where the inner lists are columns. 
         i.e. returns a list of columns. """
    lol = []
    for key in keylist:
        col = []
        for item in dol[key]:
            col.append(item)
        lol.append(col)
    return lol

# DECLARATIVE/FUNCTIONAL APPROACH
def bycol_decl(dol, keylist):
    """Converts a dictionary of lists to a list of lists using the
    values of the dictionaries. Each list must be the same length.
       dol: dictionary of lists
       keylist: list of keys, ordered as desired
       Returns: a list of lists where the inner lists are columns. 
         i.e. returns a list of columns. """
    return [[item for item in dol[key]] for key in keylist]

# TEST
if __name__ == "__main__": 
    dol = {
        'a': ['a1', 'a2', 'a3'],   # column a
        'b': ['b1', 'b2', 'b3'],   # column b
        'c': ['c1', 'c2', 'c3'],   # column c
        }
    keylist = ['a', 'b', 'c']
    print byrow_imper(dol, keylist)
    print byrow_decl(dol, keylist)
    print bycol_imper(dol, keylist)
    print bycol_decl(dol, keylist)

Results:

[['a1', 'b1', 'c1'], ['a2', 'b2', 'c2'], ['a3', 'b3', 'c3']]
[['a1', 'b1', 'c1'], ['a2', 'b2', 'c2'], ['a3', 'b3', 'c3']]
[['a1', 'a2', 'a3'], ['b1', 'b2', 'b3'], ['c1', 'c2', 'c3']]
[['a1', 'a2', 'a3'], ['b1', 'b2', 'b3'], ['c1', 'c2', 'c3']]

How to convert a list of dictionaries to a list of lists in Python

2007-12-10T12:28:00-08:00

UPDATE: See my new post, Tabular data structure conversion in Python for an updated method which handles non-rectangular data.

The functions below convert a rectangular list of dictionaries to a list of lists. Each dictionary in the list must have the same keys. Additionally, a list of keys is required as an input argument to specify the desired ordering of the columns in the returned list of lists. If this were not specified, the order of the columns would be unknown since items in a dictionary are unordered.

The converted list of lists can contain either a list of rows or a list of columns. The first two functions create a lists of rows; the last two create a list of columns. (I consider each dict in the list of dicts as a row, and all values for a given key as a column.)

#!/usr/bin/python

def byrow_imper(lod, keylist):
    """Converts a list of dictionaries to a list of lists using the
    values of the dictionaries. Assumes that each dictionary has the
    same keys. 
       lod: list of dictionaries
       keylist: list of keys, ordered as desired
       Returns: a list of lists where the inner lists are rows. 
         i.e. returns a list of rows. """
    # imperative/procedural approach
    lol = []
    for row in lod:
        row2 = []
        for key in keylist:
            row2.append(row[key])
        lol.append(row2)
    return lol

def byrow_decl(lod, keylist):
    """Converts a list of dictionaries to a list of lists using the
    values of the dictionaries. Assumes that each dictionary has the
    same keys. 
       lod: list of dictionaries
       keylist: list of keys, ordered as desired
       Returns: a list of lists where the inner lists are rows. 
         i.e. returns a list of rows. """
    # declarative/functional approach
    return [[row[key] for key in keylist] for row in lod]

def bycol_imper(lod, keylist):
    """Converts a list of dictionaries to a list of lists using the
    values of the dictionaries. Assumes that each dictionary has the
    same keys. 
       lod: list of dictionaries
       keylist: list of keys, ordered as desired
       Returns: a list of lists where the inner lists are columns. 
         i.e. returns a list of columns. """
    # imperative/procedural approach
    lol = []
    for key in keylist:
        col = []
        for row in lod:
            col.append(row[key])
        lol.append(col)
    return lol

def bycol_decl(lod, keylist):
    """Converts a list of dictionaries to a list of lists using the
    values of the dictionaries. Assumes that each dictionary has the
    same keys. 
       lod: list of dictionaries
       keylist: list of keys, ordered as desired
       Returns: a list of lists where the inner lists are columns. 
         i.e. returns a list of columns. """
    # declarative/functional approach
    return  [[row[key] for row in lod] for key in keylist]

if __name__ == "__main__": 
    lod = [
        {'a':1, 'b':2, 'c':3},
        {'a':4, 'b':5, 'c':6},
        {'a':7, 'b':8, 'c':9},
        ]
    keylist = ['a', 'b', 'c']
    print byrow_imper(lod, keylist)
    print byrow_decl(lod, keylist)
    print bycol_imper(lod, keylist)
    print bycol_decl(lod, keylist)

Results:

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

PyQt 4.3 QTableView / QAbstractTableModel sorting example

2007-12-05T18:54:00-08:00

It took me a while to figure out why QTableView's setSortingEnabled method wasn't working. It turns out the sort method in QAbstractItemModel is not implemented. So I had to implement it myself. Hence, my previous post, How to sort a table by columns in Python. I'm not sure if this is the best way to implement the sort method, but I couldn't find anything else out there, and this seems to work for me.

import re
import operator
import os
import sys 
from PyQt4.QtCore import * 
from PyQt4.QtGui import * 
 
def main(): 
    app = QApplication(sys.argv) 
    w = MyWindow() 
    w.show() 
    sys.exit(app.exec_()) 
 
class MyWindow(QWidget): 
    def __init__(self, *args): 
        QWidget.__init__(self, *args) 

        # create table
        self.get_table_data()
        table = self.createTable() 
         
        # layout
        layout = QVBoxLayout()
        layout.addWidget(table) 
        self.setLayout(layout) 

    def get_table_data(self):
        stdouterr = os.popen4("dir c:\\")[1].read()
        lines = stdouterr.splitlines()
        lines = lines[5:]
        lines = lines[:-2]
        self.tabledata = [re.split(r"\s+", line, 4)
                     for line in lines]

    def createTable(self):
        # create the view
        tv = QTableView()

        # set the table model
        header = ['date', 'time', '', 'size', 'filename']
        tm = MyTableModel(self.tabledata, header, self) 
        tv.setModel(tm)

        # set the minimum size
        tv.setMinimumSize(400, 300)

        # hide grid
        tv.setShowGrid(False)

        # set the font
        font = QFont("Courier New", 8)
        tv.setFont(font)

        # hide vertical header
        vh = tv.verticalHeader()
        vh.setVisible(False)

        # set horizontal header properties
        hh = tv.horizontalHeader()
        hh.setStretchLastSection(True)

        # set column width to fit contents
        tv.resizeColumnsToContents()

        # set row height
        nrows = len(self.tabledata)
        for row in xrange(nrows):
            tv.setRowHeight(row, 18)

        # enable sorting
        tv.setSortingEnabled(True)

        return tv
 
class MyTableModel(QAbstractTableModel): 
    def __init__(self, datain, headerdata, parent=None, *args): 
        """ datain: a list of lists
            headerdata: a list of strings
        """
        QAbstractTableModel.__init__(self, parent, *args) 
        self.arraydata = datain
        self.headerdata = headerdata
 
    def rowCount(self, parent): 
        return len(self.arraydata) 
 
    def columnCount(self, parent): 
        return len(self.arraydata[0]) 
 
    def data(self, index, role): 
        if not index.isValid(): 
            return QVariant() 
        elif role != Qt.DisplayRole: 
            return QVariant() 
        return QVariant(self.arraydata[index.row()][index.column()]) 

    def headerData(self, col, orientation, role):
        if orientation == Qt.Horizontal and role == Qt.DisplayRole:
            return QVariant(self.headerdata[col])
        return QVariant()

    def sort(self, Ncol, order):
        """Sort table by given column number.
        """
        self.emit(SIGNAL("layoutAboutToBeChanged()"))
        self.arraydata = sorted(self.arraydata, key=operator.itemgetter(Ncol))        
        if order == Qt.DescendingOrder:
            self.arraydata.reverse()
        self.emit(SIGNAL("layoutChanged()"))

if __name__ == "__main__": 
    main()

How to sort a table by columns in Python

2007-12-05T18:19:00-08:00

I have a 2-dimensional table of data implemented as a list of lists in Python. I would like to sort the data by an arbitrary column. This is a common task with tabular data. For example, Windows Explorer allows me to sort the list of files by Name, Size, Type, or Date Modified. I tried the code from this article, however, if there are duplicate entries in the column being sorted, the duplicates are removed. This is not what I wanted, so I did some further searching, and found a nice solution from the HowTo/Sorting article on the PythonInfo Wiki. This method also uses the built-in sorted() function, as well as the key paramenter, and operator.itemgetter(). (See section 2.1 and 6.7 of the Python Library Reference for more information.) The following code sorts the table by the second column (index 1). Note, Python 2.4 or later is required.

import operator

def sort_table(table, col=0):
    return sorted(table, key=operator.itemgetter(col))

if __name__ == '__main__':
    mytable = (
        ('Joe', 'Clark', '1989'),
        ('Charlie', 'Babbitt', '1988'),
        ('Frank', 'Abagnale', '2002'),
        ('Bill', 'Clark', '2009'),
        ('Alan', 'Clark', '1804'),
        )
    for row in sort_table(mytable, 1):
        print row

Results:

('Frank', 'Abagnale', '2002')
('Charlie', 'Babbitt', '1988')
('Joe', 'Clark', '1989')
('Bill', 'Clark', '2009')
('Alan', 'Clark', '1804')

This works well, but I would also like the table to be sorted by column 0 in addition to column 1. In this example, column 1 holds the Last Name and column 0 holds the First Name. I would like the table to be sorted first by Last Name, and then by First Name. Here is the code to sort the table by multiple columns. The cols argument is a tuple specifying the columns to sort by. The first column to sort by is listed first, the second second, and so on.

import operator

def sort_table(table, cols):
    """ sort a table by multiple columns
        table: a list of lists (or tuple of tuples) where each inner list 
               represents a row
        cols:  a list (or tuple) specifying the column numbers to sort by
               e.g. (1,0) would sort by column 1, then by column 0
    """
    for col in reversed(cols):
        table = sorted(table, key=operator.itemgetter(col))
    return table

if __name__ == '__main__':
    mytable = (
        ('Joe', 'Clark', '1989'),
        ('Charlie', 'Babbitt', '1988'),
        ('Frank', 'Abagnale', '2002'),
        ('Bill', 'Clark', '2009'),
        ('Alan', 'Clark', '1804'),
        )
    for row in sort_table(mytable, (1,0)):
        print row

Results:

('Frank', 'Abagnale', '2002')
('Charlie', 'Babbitt', '1988')
('Alan', 'Clark', '1804')
('Bill', 'Clark', '2009')
('Joe', 'Clark', '1989')

PyQt 4.3 QTableView / QAbstractTableModel formatting example

2007-12-04T18:06:00-08:00

This example displays the output of the "dir" command using QTableView and QAbstractTableModel. It adds some more formatting to my previous example such as specifying the font, a header, column width, row height, etc.

import re
import os
import sys 
from PyQt4.QtCore import * 
from PyQt4.QtGui import * 
 
def main(): 
    app = QApplication(sys.argv) 
    w = MyWindow() 
    w.show() 
    sys.exit(app.exec_()) 
 
class MyWindow(QWidget): 
    def __init__(self, *args): 
        QWidget.__init__(self, *args) 

        # create table
        self.get_table_data()
        table = self.createTable() 
         
        # layout
        layout = QVBoxLayout()
        layout.addWidget(table) 
        self.setLayout(layout) 

    def get_table_data(self):
        stdouterr = os.popen4("dir c:\\")[1].read()
        lines = stdouterr.splitlines()
        lines = lines[5:]
        lines = lines[:-2]
        self.tabledata = [re.split(r"\s+", line, 4)
                     for line in lines]

    def createTable(self):
        # create the view
        tv = QTableView()

        # set the table model
        header = ['date', 'time', '', 'size', 'filename']
        tm = MyTableModel(self.tabledata, header, self) 
        tv.setModel(tm)

        # set the minimum size
        self.setMinimumSize(400, 300)

        # hide grid
        tv.setShowGrid(False)

        # set the font
        font = QFont("Courier New", 8)
        tv.setFont(font)

        # hide vertical header
        vh = tv.verticalHeader()
        vh.setVisible(False)

        # set horizontal header properties
        hh = tv.horizontalHeader()
        hh.setStretchLastSection(True)

        # set column width to fit contents
        tv.resizeColumnsToContents()

        # set row height
        nrows = len(self.tabledata)
        for row in xrange(nrows):
            tv.setRowHeight(row, 18)

        # enable sorting
        # this doesn't work
        #tv.setSortingEnabled(True)

        return tv
 
class MyTableModel(QAbstractTableModel): 
    def __init__(self, datain, headerdata, parent=None, *args): 
        QAbstractTableModel.__init__(self, parent, *args) 
        self.arraydata = datain
        self.headerdata = headerdata
 
    def rowCount(self, parent): 
        return len(self.arraydata) 
 
    def columnCount(self, parent): 
        return len(self.arraydata[0]) 
 
    def data(self, index, role): 
        if not index.isValid(): 
            return QVariant() 
        elif role != Qt.DisplayRole: 
            return QVariant() 
        return QVariant(self.arraydata[index.row()][index.column()]) 

    def headerData(self, col, orientation, role):
        if orientation == Qt.Horizontal and role == Qt.DisplayRole:
            return QVariant(self.headerdata[col])
        return QVariant()

if __name__ == "__main__": 
    main()

PyQt example: How to run a command and disply its stdout

2007-12-03T13:51:00-08:00

This widget consists of a QLineEdit class and a QTextEdit class. The user enters a DOS command in the input box, hits RETURN, and the stdout from the command is displayed in the text box.

import os
import sys 
from PyQt4.QtCore import * 
from PyQt4.QtGui import * 
 
def main(): 
    app = QApplication(sys.argv) 
    w = MyWindow() 
    w.show() 
    sys.exit(app.exec_()) 
 
class MyWindow(QWidget): 
    def __init__(self, *args): 
        QWidget.__init__(self, *args) 
 
        # create objects
        label = QLabel(self.tr("Enter command and press Return"))
        self.le = QLineEdit()
        self.te = QTextEdit()

        # layout
        layout = QVBoxLayout(self)
        layout.addWidget(label)
        layout.addWidget(self.le)
        layout.addWidget(self.te)
        self.setLayout(layout) 

        # create connection
        self.connect(self.le, SIGNAL("returnPressed(void)"),
                     self.run_command)

    def run_command(self):
        cmd = str(self.le.text())
        stdouterr = os.popen4(cmd)[1].read()
        self.te.setText(stdouterr)
  
if __name__ == "__main__": 
    main()

How to display the stdout of a command with PyQt

2007-12-03T13:31:00-08:00

This widget contains a QPushButton and a QTextEdit box. When the button is pushed, the results of the dir is displayed in the text box.

import os
import sys 
from PyQt4.QtCore import * 
from PyQt4.QtGui import * 
 
def main(): 
    app = QApplication(sys.argv) 
    w = MyWindow() 
    w.show() 
    sys.exit(app.exec_()) 
 
class MyWindow(QWidget): 
    def __init__(self, *args): 
        QWidget.__init__(self, *args) 
 
        # create objects
        self.pb = QPushButton(self.tr("Run command"))
        self.te = QTextEdit()

        # layout
        layout = QVBoxLayout(self)
        layout.addWidget(self.pb)
        layout.addWidget(self.te)
        self.setLayout(layout) 

        # create connection
        self.connect(self.pb, SIGNAL("clicked(bool)"),
                     self.run_command)

    def run_command(self):
        stdouterr = os.popen4("dir")[1].read()
        self.te.setText(stdouterr)
  
if __name__ == "__main__": 
    main()

How to copy Python lists or other objects

2007-11-29T17:23:00-08:00

This problem had me stumped for a while today. If I have a list a, setting b = a doesn't make a copy of the list a. Instead, it makes a new reference to a. For example, see the interactive Python session below:

Python 2.5.1 (r251:54863, May 18 2007, 16:56:43)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = [1,2,3]
>>> b = a
>>> b
[1, 2, 3]
>>> a.append(4)
>>> a
[1, 2, 3, 4]
>>> b
[1, 2, 3, 4]
>>>

Here is a quick reference extracted from Chapter 9 in Learning Python, 1st Edition.

To make a copy of a list, use the following:

newList = myList[:]
newList2 = list(myList2)         # alternate method

To make a copy of a dict, use the following:

newDict = myDict.copy()

To make a copy of some other object, use the copy module:

import copy
newObj = copy.copy(myObj)        # shallow copy
newObj2 = copy.deepcopy(myObj2)  # deep copy

For more information on shallow and deep copies with the copy module, see the Python docs.

Python circular buffer

2007-11-29T12:06:00-08:00

Here is a simple circular buffer, or ring buffer, implementation in Python. It is a first-in, first-out (FIFO) buffer with a fixed size.

class RingBuffer:
    def __init__(self, size):
        self.data = [None for i in xrange(size)]

    def append(self, x):
        self.data.pop(0)
        self.data.append(x)

    def get(self):
        return self.data

Here is an example where the buffer size is 4. Ten integers, 0-9, are inserted, one at a time, at the end of the buffer. Each iteration, the first element is removed from the front of the buffer.

buf = RingBuffer(4)
for i in xrange(10):
    buf.append(i)
    print buf.get()

Here are the results:

[None, None, None, 0]
[None, None, 0, 1]
[None, 0, 1, 2]
[0, 1, 2, 3]
[1, 2, 3, 4]
[2, 3, 4, 5]
[3, 4, 5, 6]
[4, 5, 6, 7]
[5, 6, 7, 8]
[6, 7, 8, 9]

References:

Django project #3: Creating models

2007-11-28T23:55:00-08:00

This section in the tutorial was actually very straightforward. Here is a record of what I did. I first created a polls app.

sofeng@tortoise:~/Web/mysite$ python manage.py startapp polls
sofeng@tortoise:~/Web/mysite$ ll
total 92
-rw-r--r-- 1     0 2007 11/09 21:42 __init__.py
-rw-r--r-- 1   131 2007 11/09 21:56 __init__.pyc
-rwxr-xr-x 1   542 2007 11/09 21:42 manage.py
-rw-r--r-- 1 35840 2007 11/28 23:07 mydb
-rw-r--r-- 1 25600 2007 11/14 23:13 mydb_backup
drwxr-xr-x 2  4096 2007 11/28 23:15 polls
-rw-r--r-- 1  2886 2007 11/28 22:41 settings.py
-rw-r--r-- 1  1873 2007 11/28 22:42 settings.pyc
-rw-r--r-- 1   224 2007 11/28 22:44 urls.py
-rw-r--r-- 1   302 2007 11/28 22:45 urls.pyc
sofeng@tortoise:~/Web/mysite$ ll polls
total 8
-rw-r--r-- 1  0 2007 11/28 23:15 __init__.py
-rw-r--r-- 1 57 2007 11/28 23:15 models.py
-rw-r--r-- 1 26 2007 11/28 23:15 views.py

I entered the following into polls/models.py

from django.db import models

class Poll(models.Model):
    question = models.CharField(maxlength=200)
    pub_date = models.DateTimeField('date published')

class Choice(models.Model):
    poll = models.ForeignKey(Poll)
    choice = models.CharField(maxlength=200)
    votes = models.IntegerField()

I edited settings.py to include the new app:

INSTALLED_APPS = (
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',
    'mysite.polls',
)

I ran python manage.py sql polls to see the SQL CREATE TABLE statements for the polls app.

sofeng@tortoise:~/Web/mysite$ python manage.py sql polls
BEGIN;
CREATE TABLE "polls_poll" (
    "id" integer NOT NULL PRIMARY KEY,
    "question" varchar(200) NOT NULL,
    "pub_date" datetime NOT NULL
);
CREATE TABLE "polls_choice" (
    "id" integer NOT NULL PRIMARY KEY,
    "poll_id" integer NOT NULL REFERENCES "polls_poll" ("id"),
    "choice" varchar(200) NOT NULL,
    "votes" integer NOT NULL
);
COMMIT;

I ran python manage.py syncdb to create the model tables in the database.

sofeng@tortoise:~/Web/mysite$ python manage.py syncdb
Creating table polls_poll
Creating table polls_choice
Installing index for polls.Choice model
Loading 'initial_data' fixtures...
No fixtures found.

I played with the shell.

sofeng@tortoise:~/Web/mysite$ python manage.py shell
Python 2.5.1 (r251:54863, Oct  5 2007, 13:36:32) 
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from mysite.polls.models import Poll, Choice
>>> Poll.objects.all()
[]
>>> from datetime import datetime
>>> p = Poll(question="What's up?", pub_date=datetime.now())
>>> p.save()
>>> p.id
1
>>> p.question
"What's up?"
>>> p.pub_date
datetime.datetime(2007, 11, 29, 1, 34, 4, 883118)
>>> p.pub_date = datetime(2005, 4, 1, 0, 0)
>>> p.save()
>>> Poll.objects.all()
[]
>>>

I edited polls/models.py so that it looked like this:

import datetime
from django.db import models

class Poll(models.Model):
    question = models.CharField(maxlength=200)
    pub_date = models.DateTimeField('date published')

    def __str__(self):
        return self.question

    def was_published_today(self):
        return self.pub_date.date() == datetime.date.today()

class Choice(models.Model):
    poll = models.ForeignKey(Poll)
    choice = models.CharField(maxlength=200)
    votes = models.IntegerField()

    def __str__(self):
        return self.choice

I went back to the shell.

sofeng@tortoise:~/Web/mysite$ python manage.py shell
Python 2.5.1 (r251:54863, Oct  5 2007, 13:36:32) 
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from mysite.polls.models import Poll, Choice
>>> Poll.objects.all()
[<Poll: What's up?>]
>>> Poll.objects.filter(id=1)
[<Poll: What's up?>]
>>> Poll.objects.filter(question__startswith='What')
[<Poll: What's up?>]
>>> Poll.objects.get(pub_date__year=2005)
<Poll: What's up?>
>>> Poll.objects.get(id=2)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/var/lib/python-support/python2.5/django/db/models/manager.py", line 73, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/var/lib/python-support/python2.5/django/db/models/query.py", line 252, in get
    raise self.model.DoesNotExist, "%s matching query does not exist." % self.model._meta.object_name
DoesNotExist: Poll matching query does not exist.
>>> Poll.objects.get(pk=1)
<Poll: What's up?>
>>> p = Poll.objects.get(pk=1)
>>> p.was_published_today()
False
>>> p = Poll.objects.get(pk=1)
>>> p.choice_set.create(choice='Not much', votes=0)
<Choice: Not much>
>>> p.choice_set.create(choice='The sky', votes=0)
<Choice: The sky>
>>> c = p.choice_set.create(choice='Just hacking again', votes=0)
>>> c.poll
<Poll: What's up?>
>>> p.choice_set.all()
[<Choice: Not much>, <Choice: The sky>, <Choice: Just hacking again>]
>>> p.choice_set.count()
3
>>> Choice.objects.filter(poll__pub_date__year=2005)
[<Choice: Not much>, <Choice: The sky>, <Choice: Just hacking again>]
>>> c = p.choice_set.filter(choice__startswith='Just hacking')
>>> c.delete()
>>>

That's it for now. Next time I will work with Django's admin web interface.

How to remove C style comments using Python

2007-11-28T17:25:00-08:00

The Perl FAQ has an entry How do I use a regular expression to strip C style comments from a file? Since I've switched to Python, I've adapted the Perl solution to Python. This regular expression was created by Jeffrey Friedl and later modified by Fred Curtis. I'm not certain, but it appears to use the "unrolling the loop" technique described in Chapter 6 of Mastering Regular Expressions.

remove_comments.py:

import re
import sys

def remove_comments(text):
    """ remove c-style comments.
        text: blob of text with comments (can include newlines)
        returns: text with comments removed
    """
    pattern = r"""
                            ##  --------- COMMENT ---------
           /\*              ##  Start of /* ... */ comment
           [^*]*\*+         ##  Non-* followed by 1-or-more *'s
           (                ##
             [^/*][^*]*\*+  ##
           )*               ##  0-or-more things which don't start with /
                            ##    but do end with '*'
           /                ##  End of /* ... */ comment
         |                  ##  -OR-  various things which aren't comments:
           (                ## 
                            ##  ------ " ... " STRING ------
             "              ##  Start of " ... " string
             (              ##
               \\.          ##  Escaped char
             |              ##  -OR-
               [^"\\]       ##  Non "\ characters
             )*             ##
             "              ##  End of " ... " string
           |                ##  -OR-
                            ##
                            ##  ------ ' ... ' STRING ------
             '              ##  Start of ' ... ' string
             (              ##
               \\.          ##  Escaped char
             |              ##  -OR-
               [^'\\]       ##  Non '\ characters
             )*             ##
             '              ##  End of ' ... ' string
           |                ##  -OR-
                            ##
                            ##  ------ ANYTHING ELSE -------
             .              ##  Anything other char
             [^/"'\\]*      ##  Chars which doesn't start a comment, string
           )                ##    or escape
    """
    regex = re.compile(pattern, re.VERBOSE|re.MULTILINE|re.DOTALL)
    noncomments = [m.group(2) for m in regex.finditer(text) if m.group(2)]

    return "".join(noncomments)

if __name__ == '__main__':
    filename = sys.argv[1]
    code_w_comments = open(filename).read()
    code_wo_comments = remove_comments(code_w_comments)
    fh = open(filename+".nocomments", "w")
    fh.write(code_wo_comments)
    fh.close()

Example:
To test the script, I created a test file called testfile.c:

/* This is a C-style comment. */
This is not a comment.
/* This is another
 * C-style comment.
 */
"This is /* also not a comment */"

Run the script:
To use the script, I put the script, remove_comments.py, and my test file, testfile.c, in the same directory and ran the following command:

python remove_comments.py testfile.c

Results:
The script created a new file called testfile.c.nocomments:

This is not a comment.

"This is /* also not a comment */"

---------------
Minor note on Perl to Python migration:
I modified the original regular expression comments a little bit. In particular, I had to put at least one character after the

##
  Non "\

and ## Non '\ lines because, in Python, the backslash was escaping the following newline character and the closing parenthesis on the following line was being treated as a comment by the regular expression engine. This is the error I got, before the fix:

$ python remove_comments.py
Traceback (most recent call last):
  File "remove_comments.py", line 39, in <module>
    regex = re.compile(pattern, re.VERBOSE|re.MULTILINE|re.DOTALL)
  File "C:\Programs\Python25\lib\re.py", line 180, in compile
    return _compile(pattern, flags)
  File "C:\Programs\Python25\lib\re.py", line 233, in _compile
    raise error, v # invalid expression
sre_constants.error: unbalanced parenthesis

Migrating Excel data to SQLite using Python

2007-11-27T18:13:00-08:00

In a previous post, I described how I designed a SQLite relational database from an Excel table. It was a small example, so I hardcoded the data into the Python script. For my actual problem, I need to convert my Excel data into a SQLite database automatically. To do this, I used the win32com module and the sqlite3 module included in Python 2.5.

Here is the table from my previous post. It shows some variables in in my C program. It shows the variable name, type, the module it belongs to, and a short description. Here is the table from my previous post. It shows some variables in in my C program. It shows the variable name, type, the module it belongs to, and a short description.

id	name	module	type	desc
1	foo	ModuleExt	double	Description of foo
2	bar	ModuleExt	double	Description of bar
3	knark	Module1	int	Description of knark
4	wert	Module1	double	Description of wert
5	jib	Module1	double	Description of jib
6	laz	Module2	double	Description of laz
7	kew	Module2	double	Description of kew

After installing the win32com module from http://sourceforge.net/project/platformdownload.php?group_id=78018, I used the following code.

import os
import sqlite3
from win32com.client import constants, Dispatch

#----------------------------------------
# get data from excel file
#----------------------------------------
XLS_FILE = os.getcwd() + "\\example.xls"
ROW_SPAN = (14, 21)
COL_SPAN = (2, 7)
app = Dispatch("Excel.Application")
app.Visible = True
ws = app.Workbooks.Open(XLS_FILE).Sheets(1)
exceldata = [[ws.Cells(row, col).Value 
              for col in xrange(COL_SPAN[0], COL_SPAN[1])] 
             for row in xrange(ROW_SPAN[0], ROW_SPAN[1])]

#----------------------------------------
# create SQL table and fill it with data
#----------------------------------------
conn = sqlite3.connect('example.db')
c = conn.cursor()
c.execute('''CREATE TABLE exceltable (
   id INTEGER,
   name TEXT,
   module TEXT,
   type TEXT,
   desc TEXT
)''')
for row in exceldata:
    c.execute('INSERT INTO exceltable VALUES (?,?,?,?,?)', row)
conn.commit()

#----------------------------------------
# display SQL data
#----------------------------------------
c.execute('SELECT * FROM exceltable')
for row in c:
    print row

The Excel filename is example.xls and the table data begins at B14 (2nd column, 14th row) and ends at F20 (6th column, 20th row) in Sheet 1. The script uses a declarative approach to store the data in a Python list of lists. It creates a SQLite database named example.db and creates a connection to it. It then fills the database using the Python list data structure. Finally, it displays the newly created database. The screen output is shown below.

(1, u'foo', u'ModuleExt', u'double', u'Description of foo')
(2, u'bar', u'ModuleExt', u'double', u'Description of bar')
(3, u'knark', u'Module1', u'int', u'Description of knark')
(4, u'wert', u'Module1', u'double', u'Description of wert')
(5, u'jib', u'Module1', u'double', u'Description of jib')
(6, u'laz', u'Module2', u'double', u'Description of laz')
(7, u'kew', u'Module2', u'double', u'Description of kew')

If I want to interact with the newly created database using SQLite directly, I can run sqlite3 example.db from my Cygwin bash command line. (Note the conn.commit() line in my Python script is very important for this step to work. For some reason, I didn't see this in the Python documentation.) Here is an example session using SQLite directly with the new database.

$ sqlite3 example.db
SQLite version 3.5.1
Enter ".help" for instructions
sqlite> .schema
CREATE TABLE exceltable (
   id INTEGER,
   name TEXT,
   module TEXT,
   type TEXT,
   desc TEXT
);
sqlite> select * from exceltable;
1|foo|ModuleExt|double|Description of foo
2|bar|ModuleExt|double|Description of bar
3|knark|Module1|int|Description of knark
4|wert|Module1|double|Description of wert
5|jib|Module1|double|Description of jib
6|laz|Module2|double|Description of laz
7|kew|Module2|double|Description of kew
sqlite>

python sleep

2007-11-15T22:46:00-08:00

How to sleep for 5 seconds in python:

import time

time.sleep(5)

How to sleep for 0.5 seconds in python:

import time

time.sleep(0.5)

Documentation: time.sleep()

Django project #2: SQLite setup

2007-11-15T00:26:00-08:00

In the first installment of the Sample Django Project, I installed Django and created a project. In this installment, I will set up the SQLite database. At first, I thought I had to figure out what kind of data to put in the database, but in actuality, I can create an empty database and fill it in later. That is what I am doing. Here are the steps.

Install SQLite 3

sofeng@tortoise:~$ sudo apt-get install sqlite3

Edit settings.py
cd to the mysite directory created last time.

sofeng@tortoise:~$ cd ~/Web/mysite

Edit settings.py and change the following 2 lines

DATABASE_ENGINE = 'sqlite3'
DATABASE_NAME = '/home/sofeng/Web/mysite/mydb'

Create Django tables in the database

sofeng@tortoise:~/Web/mysite$ python manage.py syncdb
Creating table auth_message
Creating table auth_group
Creating table auth_user
Creating table auth_permission
Creating table django_content_type
Creating table django_session
Creating table django_site

You just installed Django's auth system, which means you don't have any superusers defined.
Would you like to create one now? (yes/no): yes
Username (Leave blank to use 'sofeng'): 
E-mail address: youremail@yourhost.com
Password: 
Password (again): 
Superuser created successfully.
Installing index for auth.Message model
Installing index for auth.Permission model
Loading 'initial_data' fixtures...
No fixtures found.

Take a look at the databases created

sofeng@tortoise:~/Web/mysite$ sqlite3 mydb
SQLite version 3.4.2
Enter ".help" for instructions
sqlite> .schema

You should see a bunch of CREATE TABLE statements. If you get the following error message, it probably means you used sqlite instead of sqlite3.

Unable to open database "mydb": file is encrypted or is not a database

Well that was pretty easy. Next time, we'll create some models and actually write some python code.

Relational database introduction with Python and SQLite

2007-11-14T17:37:00-08:00

I mentioned in a previous post that I have an Excel spreadsheet containing a bunch of information about the development C code I'm working on. It is a large table showing which variables are input and output from which functions. The variable names are in the first column and the function names are in the last several columns of the first row. I use "i" or "o" to denote if a variable is an input to or an output from a particular function. I also have a few columns for the variable type and description. A simplified example is shown in the table below.

ORIGINAL TABLE

name	module	type	desc	ExtSource	DoThis	CalcThis	CalcThat	ModifyStuff	ExtSink
foo	ModuleExt	double	Description of foo	o	i		i
bar	ModuleExt	double	Description of bar	o	i	i
knark	Module1	int	Description of knark		io	i
wert	Module1	double	Description of wert		o		i	i
jib	Module1	double	Description of jib			o		i
laz	Module2	double	Description of laz			o		o	i
kew	Module2	double	Description of kew				o	o	i

As shown above, the table also includes the variable descriptions, type, and scope. In my real, unsimplified Excel table, I also include function-related information such as the function description, and the function prototype. This additional information makes the table very messy. Try to imagine where you would put the function description and function prototype information in the table above. I came up with a kludge solution, but it involved repeated information, and rows and columns with inconsistent meaning. I was realizing the limitations of a 2-dimensional table.

I had been interested in learning about relational databases, so this seemed like a good opportunity. Relational databases imply the SQL query language-- relational databases that use SQL are about the only kind around. They include: Oracle, Microsoft SQL Server, PostgreSQL, MySQL, and SQLite among others. I chose SQLite for my task because it is lightweight (no separate server), free, popular, and fast for small databases. (See http://www.sqlite.org/different.html for other distinctive features.) Also Python 2.5 now includes an interface module, sqlite3, as part of the standard distribution. See http://docs.python.org/lib/module-sqlite3.html for the documentation.

Relational model design

I could create a relational database using my original table in the Excel spreadsheet. However, this organization is not in the spirit of relational database design. The [relational] approach is to create many smaller tables and relationships between those tables. For my example, I created a variable table, a function table, and a variable_function table to indicate the relationship between the variable and function tables. Though it looks more complicated than the original single table form, inputting this into a relational database, such as SQLite, will allow for more advanced manipulation of the data. It is also much more [conducive] to maintenance and expansion. For example, now there is a natural place to put the additional function-related information which was so clumsy in the original single table. To do this, I add desc and prototype columns to the function table. If I needed to add more information to about the different modules, I could create a separate module table and include the module_id as a column in the variable table.

VARIABLE TABLE

id	name	module	type	desc
1	foo	ModuleExt	double	Description of foo
2	bar	ModuleExt	double	Description of bar
3	knark	Module1	int	Description of knark
4	wert	Module1	double	Description of wert
5	jib	Module1	double	Description of jib
6	laz	Module2	double	Description of laz
7	kew	Module2	double	Description of kew

FUNCTION TABLE

id	name
1	ExtSource
2	DoThis
3	CalcThis
4	CalcThat
5	ModifyStuff
6	ExtSink

VARIABLE_FUNCTION TABLE

variable_id	function_id	type
1	1	output
1	2	input
1	4	input
2	1	output
2	2	input
2	3	input
3	2	input-output
3	3	input
4	2	output
4	4	input
4	5	input
5	3	output
5	5	input
6	3	output
6	5	output
6	6	input
7	4	output
7	5	output
7	6	input

SQLite Implementation (w/ Python)

To implement this database, all I needed was Python 2.5. If you have an older version of Python, you can install the pysqlite module. Here is the Python code to implement the database above.

import sqlite3

# data
VARIABLES = (
    (1, 'foo', 'ModuleExt', 'double', 'Description of foo'),
    (2, 'bar', 'ModuleExt', 'double', 'Description of bar'),
    (3, 'knark', 'Module1', 'int', 'Description of knark'),
    (4, 'wert', 'Module1', 'double', 'Description of wert'),
    (5, 'jib', 'Module1', 'double', 'Description of jib'),
    (6, 'laz', 'Module2', 'double', 'Description of laz'),
    (7, 'kew', 'Module2', 'double', 'Description of kew'),
)
FUNCTIONS = (
    (1, 'ExtSource'),
    (2, 'DoThis'),
    (3, 'CalcThis'),
    (4, 'CalcThat'),
    (5, 'ModifyStuff'),
    (6, 'ExtSink'),
)
VAR_FUNC = (
    (1, 1, 'output'),
    (1, 2, 'input'),
    (1, 4, 'input'),
    (2, 1, 'output'),
    (2, 2, 'input'),
    (2, 3, 'input'),
    (3, 2, 'input-output'),
    (3, 3, 'input'),
    (4, 2, 'output'),
    (4, 4, 'input'),
    (4, 5, 'input'),
    (5, 3, 'output'),
    (5, 5, 'input'),
    (6, 3, 'output'),
    (6, 5, 'output'),
    (6, 6, 'input'),
    (7, 4, 'output'),
    (7, 5, 'output'),
    (7, 6, 'input'),
)

# get connection and cursor objects
conn = sqlite3.connect('iodatabase')
c = conn.cursor()

# create tables
c.execute('''create table variable (
    id integer,
    name text,
    module text,
    type text,
    desc text
)''')
c.execute('''create table function (
    id integer,
    name text
)''')
c.execute('''create table var_func (
    variable_id integer,
    function_id integer,
    type text
)''')

# fill tables with data
for row in VARIABLES:
    c.execute('insert into variable values (?,?,?,?,?)', row)
for row in FUNCTIONS:
    c.execute('insert into function values (?,?)', row)
for row in VAR_FUNC:
    c.execute('insert into var_func values (?,?,?)', row)

Querying the Database

Now that I have created the database, I can get answers to interesting questions, such as What variables are output by CalcThis? Here is the Python/SQLite code to answer this question:

c.execute(''.join([
            'SELECT variable.name, variable.module, variable.type, variable.desc ',
            'FROM variable, var_func, function ',
            'WHERE variable.id=var_func.variable_id ', 
            'AND function.id=var_func.function_id ',
            'AND function.name="CalcThis" ',
            'AND var_func.type="output" ',
            ]))
FORMAT = '%-6s%-10s%-8s%-20s'
print FORMAT % ('name', 'module', 'type', 'desc')
print '-' * 44
for row in c:
    print FORMAT % row

Here is the output.

name    module    type    desc
----------------------------------------------
jib     Module1   double  Description of jib
laz     Module2   double  Description of laz

I can see this is consistent with my original table. The query works. For such a small example, the original method may seem easier, but as the number of entries grows, the benefit of the relational database grows as well. Here is another example which asks Which functions use the variable wert as an input?

c.execute(''.join([
            'SELECT function.name ',
            'FROM variable, var_func, function ',
            'WHERE variable.id=var_func.variable_id ', 
            'AND function.id=var_func.function_id ',
            'AND variable.name="wert" ',
            'AND var_func.type="input" ',
            ]))
print 'name'
print '------------'
for row in c:
    print '%s' % row

Output:

name
---------
CalcThat
ModifyStuff

I would like to do even more complicated things like determine the prerequisite input variables across all functions for a given output variable. However, I still need to read more about that, so that will have to wait.

Django project #1: Install

2007-11-09T22:28:00-08:00

I plan to do a side project in Django just to get some web programming experience. I'm not sure exactly what the site will be yet. Maybe my very own blog software with extensive tagging capabilities and maybe revision control. I also have an idea for a site to specialize in comparison searching. E.g. "python vs. ruby", "mac vs. linux", "kde vs. gnome", etc. Or, if I find someone who needs a website, I might work someone else's project. As long as I get to choose the technology. Anyways, I plan to document my steps here. I've already installed Ubuntu. Here are my steps in creating my Django website.

Install django

$ sudo apt-get install python-django

Test install

To test the install, I typed import django in the python interpreter.

$ python
Python 2.5.1 (r251:54863, Oct  5 2007, 13:36:32) 
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import django
>>>

Alright, It worked.

Create a project

I tried to follow the Django tutorial and ran the django-admin.py command. However, I got the command not found error message.

$ django-admin.py startproject mysite
bash: django-admin.py: command not found

I googled for the error message and found this link which said to link the django-admin.py file to a location in your path.

$ sudo ln -s /usr/lib/python-django/bin/django-admin.py /usr/local/bin/django-admin.py

I tried the django-admin.py command again, and now got a Permission denied error.

$ django-admin.py startproject mysite
bash: /usr/local/bin/django-admin.py: Permission denied

Googling again, I found the SVN release version of the tutorial which gave instructions about the Permission denied error. (I was using the 0.96 version of the tutorial). So I changed the permissions of the django-admin.py file to be executable.

$ sudo chmod +x /usr/lib/python-django/bin/django-admin.py

I tried the command again.

$ django-admin.py startproject mysite

Alright, success! I got no errors.

Start the development web server

$ cd mysite
$ python manage.py runserver
Validating models...
0 errors found.

Django version 0.96, using settings 'mysite.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Alright, it worked! I know you are excited. I visited http://127.0.0.1:8000/ in my browser and got a nice page affirming me of my success.

It worked!
Congratulations on your first Django-powered page.

This seems like a good stopping point. The next step is setting up the database.

How to get current working directory in Python

2007-11-02T11:41:00-07:00

I always forget how to get the current working directory so here it is, mainly for my own reference.

import os 
os.getcwd() 
'C:\\Python25'

Using Python's finditer to highlight search items

2007-10-16T17:39:00-07:00

I am trying to search through various text and highlight certain search terms within that text using HTML markup. As an example, if I take a paragraph of text from Paul Prescod's essay, I would like to highlight the search terms "lisp", "python", "perl", "java", and "C" each in different colors. My first attempt at this problem looked somthing like:

for sentence in re.split(r"[?.]\s+", text):
    match = re.search(r"\blisp\b", sentence, re.I)
    if match:
        color = 'red'
    else:
        match = re.search(r"\bpython\b", sentence, re.I)
        if match:
            color = 'blue'
        else:
            match = re.search(r"\bperl\b", sentence, re.I)
            if match:
                color = 'orange'

I didn't finish it because, not only is it ugly and verbose, it doesn't do what I want. Instead of matching all the search terms, it only matches the first one in each sentence. Fortunately, I took some time to rethink the problem (i.e. search the internet (this thread on the Python mailing list was helpful (I guess my Perl background is still showing) as was this article which I previously referenced. (hmmm, this is starting to look like Lisp.))) and made a prettier (and correct) version using my new favorite regular expression method, finditer, and the MatchObject's lastindex attribute. Here is the working example:

import re

COLOR = ['red', 'blue', 'orange', 'violet', 'green']

text = """Graham says that Perl is cooler than Java and Python than Perl. In some circles, maybe. Graham uses the example of Slashdot, written in Perl. But what about Advogato, written in C? What about all of the cool P2P stuff being written in all three of the languages? Considering that Perl is older than Java, and was at one time the Next Big Language, I think you would have a hard time getting statistical evidence that programmers consider Perl "cooler" than Java, except perhaps by virtue of the fact that Java has spent a few years as the "industry standard" (and is thus uncool for the same reason that the Spice Girls are uncool) and Perl is still "underground" (and thus cool, for the same reason that ambient is cool). Python is even more "underground" than Perl (and thus cooler?). Maybe all Graham has demonstrated is that proximity to Lisp drives a language underground. Except that he's got the proximity to Lisp argument backwards too."""

regex = re.compile(r"(\blisp\b)|(\bpython\b)|(\bperl\b)|(\bjava\b)|(\bc\b)", re.I)

i = 0; output = "<html>"
for m in regex.finditer(text):
    output += "".join([text[i:m.start()],
                       "<strong><span style='color:%s'>" % COLOR[m.lastindex-1],
                       text[m.start():m.end()],
                       "</span></strong>"])
    i = m.end()
print "".join([output, text[m.end():], "</html>"])

This example loops over each match in the iterator object returned by finditer. For each match, non-matching text and matching text surrounded with the HTML <span> tag are appended to the output string. start() and end() return the indices to the start and end positions of the matching text. The color of the text is determined by using lastindex to index into a list of colors. lastindex is the index of the group of the last match. So, it is "1" if "lisp" is matched, "2" if "python" is matched, "3" if "perl" is matched, and so on. I need to subtract 1 because the list indexing starts at 0. The last line adds on the rest of the non-matching text, and prints it. When viewed in a browser, it looks something like this:

Graham says that Perl is cooler than Java and Python than Perl. In some circles, maybe. Graham uses the example of Slashdot, written in Perl. But what about Advogato, written in C? What about all of the cool P2P stuff being written in all three of the languages? Considering that Perl is older than Java, and was at one time the Next Big Language, I think you would have a hard time getting statistical evidence that programmers consider Perl "cooler" than Java, except perhaps by virtue of the fact that Java has spent a few years as the "industry standard" (and is thus uncool for the same reason that the Spice Girls are uncool) and Perl is still "underground" (and thus cool, for the same reason that ambient is cool). Python is even more "underground" than Perl (and thus cooler?). Maybe all Graham has demonstrated is that proximity to Lisp drives a language underground. Except that he's got the proximity to Lisp argument backwards too.

Using Python's finditer for Lexical Analysis

2007-10-16T17:35:00-07:00

Fredrik Lundh wrote a good article called Using Regular Expressions for Lexical Analysis which explains how to use Python regular expressions to read an input string and group characters into lexical units, or tokens. The author's first group of examples read in a simple expression, "b = 2 + a*10", and output strings classified as one of three token types: symbols (e.g. a and b), integer literals (e.g. 2 and 10), and operators (e.g. =, +, and *). His first three examples use the findall method and his fourth example uses the undocumented scanner method from the re module. Here is the example code from the fourth example. Note that the "1" in the first column of the results corresponds to the integer literals token group, "2" corresponds to the symbols group, and "3" to the operators group.

import re

expr = "b = 2 + a*10"
pos = 0
pattern = re.compile("\s*(?:(\d+)|(\w+)|(.))")
scan = pattern.scanner(expr)
while 1:
    m = scan.match()
    if not m:
        break
    print m.lastindex, repr(m.group(m.lastindex))

Here are the results:

2 'b'
3 '='
1 '2'
3 '+'
2 'a'
3 '*'
1 '10'

Since this article was dated 2002, and the author was using Python 2.0, I wondered if this was the most current approach. The author notes that recent versions (i.e. version 2.2 or later) of Python allow you to use the finditer method which uses an internal scanner object. Using finditer makes the example code much simpler. Here is Fredrik's example using finditer:

import re

expr = "b = 2 + a*10"
regex = re.compile("\s*(?:(\d+)|(\w+)|(.))")
for m in regex.finditer(expr):
    print m.lastindex, repr(m.group(m.lastindex))

Running it produces the same results as the original.

Python profiling of 2 string concatenation techniques

2007-10-09T18:22:00-07:00

In Efficient String Concatenation in Python, the author tests the performace of 6 methods for concatenating strings in Python. I wanted to test the methods myself, for the experience, and also because the article was a few years old. It turns out the timing module he used for performance profiling is no longer included in Python 2.5. So I went to the Python documentation and found that there are 3 profilers currently included with Python 2.5: profile, cProfile, and hotshot. (See 25. The Python Profilers in the Library Reference for more info.) I made a quick choice to use cProfile and tried out the fastest and slowest of the 6 methods. Below is the code and the results. It turns out, for me, the second method is not as significantly different as the original test. (Maybe because this was improved between Python 2.2 and 2.5? I'm guessing.) However, it is much more concise, and appears to be the more elegant, declarative approach that I have founnd myself reading about recently.

test.py:

import cProfile

BIG_NUMBER = 1000000

def method1():
    mystring = ''
    for i in xrange(BIG_NUMBER):
        mystring += `i`
    return mystring

def method2():
    return ''.join([`i` for i in xrange(BIG_NUMBER)])

cProfile.run('method1()')
cProfile.run('method2()')

Results:

$ python test.py
         3 function calls in 2.515 CPU seconds
   
   Ordered by: standard name
   
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.515    2.515 <string>:1(<module>)
        1    2.515    2.515    2.515    2.515 test.py:5(method1)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
         
         
         4 function calls in 1.734 CPU seconds
   
   Ordered by: standard name
   
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.734    1.734 <string>:1(<module>)
        1    1.609    1.609    1.734    1.734 test.py:11(method2)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.125    0.125    0.125    0.125 {method 'join' of 'str' objects}

Python finditer regular expression example

2007-10-03T12:15:00-07:00

I often process text line by line using the splitlines() method with a for loop. This works great most of the time, however, sometimes, the text is not neatly divisible into lines, or, I need to match multiple items per line. This is where the re module's finditer function can help. finditer returns an iterator over all non-overlapping matches for the regular expression pattern in the string. (See docs.) It is a powerful tool for text processing and one that I don't use often enough.

Here is a simple example which demonstrates the use of finditer. It reads in a page of html text, finds all the occurrences of the word "the" and prints "the" and the following word. It also prints the character position of each match using the MatchObject's start() method. (See docs.) Note that, for simplicity, I didn't mess with the HTML tags at all. I just pretended it was plain text. Oh, and the example text is taken from Steve Yegge's article: How To Make a Funny Talk Title Without Using The Word "Weasel"

Python code:

import re
import urllib2

html = urllib2.urlopen('http://steve-yegge.blogspot.com/2007/08/how-to-make-funny-talk-title-without.html').read()
pattern = r'\b(the\s+\w+)\s+'
regex = re.compile(pattern, re.IGNORECASE)
for match in regex.finditer(html):
    print "%s: %s" % (match.start(), match.group(1))

Results:

1301: The Word
12291: The Word
13367: the cut
14025: the car
15050: the free
15513: the third
15558: the sessions
15617: the ONLY
15684: the ground
15911: the OSI
15933: The Attack
16051: The gist
16115: the term
16178: the creator
16741: the thing
16850: the same
16877: the thing
16942: the next
17131: the talk
17374: the room
17727: the hell
17782: the term
17830: the 1980s
18083: the whole
18158: the same
18230: the mountain
18305: the seat
18537: The pro
18718: the banner
18928: the poor
19006: the midst
19223: the buzzwagon
19326: the source
19437: the OSI
19855: the OSI
19927: the other
20055: the Ten
20404: The 22
20517: the OSI
20616: the book
21098: the collective
21553: the proposed
21681: the Five
21932: the nearest
22690: The rest
22858: the entertaining
23255: the crap
23561: the next
23661: the registration
23963: the registration
24114: the restaurant
24289: the people
24456: the second
24597: the current
24871: The Style
24929: the front
25047: the curtain
25132: the movie
25159: The hospital
25249: the night
25881: the way
25892: the rear
25927: the crowd
26194: the podium
26262: the front
26521: the door
26593: the front
26622: The economist
27128: the thing
27228: The next
27290: the Pirate
27409: the material
27461: the crowd
27621: the next
27916: The technician
28084: the way
28487: the technician
28735: the exciting
35709: The Next
36587: The Pinocchio
45436: the Kingdom
45679: The Truth
51623: the same
52526: The Word

Python word wrap function

2007-09-21T13:02:00-07:00

Update 2008-09-18: I just found out there is a textwrap module in the Python Standard Library. See the docs at: http://docs.python.org/lib/module-textwrap.html.

I'm working on creating function headers for our C development code. These headers include the function name, a purpose statement, input/output lists, etc. So my solution (because I'm lazy and prefer scripting to writing hundreds of lines of not-actually-code) is to write a Python script to write the headers. I put all the pertinent information (function names, variable names, descriptions, etc.) in an Excel spreadsheet and I'm using Python to read the data and properly format it. (See here and here for more info on using Python w/ Excel.) Things are going pretty well. The spreadsheet (and the script) are getting a little messy, but overall, it is faster (I think), more fun (definitely) and I believe will produce more consistent results. It also allows me to be more flexible if names or descriptions change in the future because I only need to change the information once in the spreadsheet instead of mutiple places in the actual .c files.

One of the rules for the function headers is that they must be 80 columns or less in width. It is annoying to wrap everything by hand so I wrote a Python function to do the wrapping. (I know Emacs could probably do all this in about 2 lines of Lisp, but I haven't learned that much yet.) Here is the script with a couple of examples:

def word_wrap(string, width=80, ind1=0, ind2=0, prefix=''):
    """ word wrapping function.
        string: the string to wrap
        width: the column number to wrap at
        prefix: prefix each line with this string (goes before any indentation)
        ind1: number of characters to indent the first line
        ind2: number of characters to indent the rest of the lines
    """
    string = prefix + ind1 * " " + string
    newstring = ""
    while len(string) > width:
        # find position of nearest whitespace char to the left of "width"
        marker = width - 1
        while not string[marker].isspace():
            marker = marker - 1

        # remove line from original string and add it to the new string
        newline = string[0:marker] + "\n"
        newstring = newstring + newline
        string = prefix + ind2 * " " + string[marker + 1:]

    return newstring + string

Example 1: wrap at 60 characters

string = "PURPOSE: To end world hunger, create peace for all people, solve all technological and scientific problems, make an exorbitant amount of money, and remain humble in the process."
print word_wrap(string, 60)

Results:

PURPOSE: To end world hunger, create peace for all people,
solve all technological and scientific problems, make an
exorbitant amount of money, and remain humble in the
process.

Example 2: wrap at 60 chars, with a prefix, and a hanging indent

print word_wrap(string, 60, 0, 9, " * ")

Results:

 * PURPOSE: To end world hunger, create peace for all
 *          people, solve all technological and scientific
 *          problems, make an exorbitant amount of money,
 *          and remain humble in the process.

How to sort a Python dict (dictionary) by keys or values

2007-09-13T16:21:00-07:00

Updated to work with both Python 2 and 3

How to sort a dict by key

mydict = {
    "carl": 40,
    "alan": 2,
    "bob": 1,
    "danny": 3,
}

for key in sorted(mydict.keys()):
    print("%s: %s" % (key, mydict[key]))

Results:

alan: 2
bob: 1
carl: 40
danny: 3

To sort the keys in reverse, add reverse=True as a keyword argument to the sorted function.

How to sort a dict by value

for key, value in sorted(mydict.items(), key=lambda item: item[1]):
    print("%s: %s" % (key, value))

Results:

bob: 1
alan: 2
danny: 3
carl: 40

Originally taken from Nick Galbreath's Digital Sanitation Engineering blog article

backing up with rsync

2007-08-31T13:55:00-07:00

Here is a python script using rsync to backup my Users directory in Vista to an external hard drive.

import os

cmd = "rsync -avz --exclude '/AppData/'" + \
      "/cygdrive/c/Users/saltycrane" + \
      "/cygdrive/f/backup/Users"
os.system(cmd)

Why not to use the backup program that comes with your external hard drive:

Do not, whatever you do, feed your valuable data to a program that is going to save it in a file format that can only be read by that program, or by that kind of computer. Because when the program can’t or the computer can’t, you’re out of options. --Tim Bray, Protecting Your Data

Why to use rsync:

rsync -essh -rtpvz rocks. --Mark Pilgrim, Essentials

On trailing slashes

from the rsync man page:

A trailing slash on the source changes this behavior to avoid creating an addi‐ tional directory level at the destination. You can think of a trailing / on a source as meaning “copy the contents of this directory” as opposed to “copy the directory by name”, but in both cases the attributes of the containing directory are transferred to the containing directory on the destination. In other words, each of the following commands copies the files in the same way, including their setting of the attributes of /dest/foo:

rsync -av /src/foo /dest
rsync -av /src/foo/ /dest/foo

Use rsync with sudo on the remote host (added 2011-09-07)

Use the --rsync-path option. e.g.

$ rsync -avz --delete --rsync-path="sudo rsync" /tmp/something myhost:/some/path

My software tools list

2007-08-10T13:32:00-07:00

Inspired by Mark Pilgrim's 2006 Essentials list, below is a list of my current software tools. If you notice a lot of "I switched from ..." statements, keep in mind that I am a programmer who likes shiny things.

Other lists

Mark Pilgrim's Essentials, 2008 edition
Adam Gomaa's Essentials (2008)
Here is a 2010 update for Mark Pilgrim. After 2 and a half years, I still have several items in common: Ubuntu running rxvt-unicode and Emacs 23, the Unicomp keyboard, an Android phone, Gmail, Google Reader, Google Docs, and Pandora.
Salvatore Sanfilippo on usesthis.com (2011)
Aaron Boodman on usesthis.com (2011)

Operating System: Ubuntu
Window Manager: Qtile
Editor: Emacs
Terminal: urxvt + screen
Version Control System: Git
"High-high-level" Language: Python
Web Framework: Django/Flask

Web Browser: Firefox
Email: Gmail
Graphical Diff: KDiff3
Keyboard: Leopold Tenkeyless (MX Browns)
Office Chair: Undecided
Mobile Phone: Motorola Droid 4

My software tools list

Category	Currently using	Comments
Operating System ¶	Ubuntu 14.04 Trusty Tahr	My first Ubuntu install was in 2007 (dual-boot) and I got my first Windows-free machine in 2008. I have used Cygwin and coLinux when on Windows. Cygwin integrates better with Windows applications, but coLinux is super fast and allows you to run a full Linux distro on top of Windows. Windows does have some advantages, but overall I prefer Linux. Update 2008-09: Switching jobs means I can now use Linux at work. I'm now using Linux about 90% of the time. My wife still uses Windows Vista on our laptop. I don't have enough experience with OSX to draw any authoritative conclusions, though I think Mark Pilgrim has biased me against Apple. Also, I think Linux's free as in beer (and somewhat related free as in speech) characteristics vs. Mac's expensive (and somewhat related proprietary) characteristics resonate with the cheap engineer in me. Additional Linux vs. Mac commentary: JWZ 2000: Unix/linux sucks less, but it still sucks. JWZ 2005, 2007: Linux sucks more. Paul Graham 2005: Return of the Mac. Mark Pilgrim 2006: Bye, Apple Cory Doctorow 2006: Me too. Tim Bray 2006: Me too, Back to the Mac Steve Yegge 2008: Switching to OSX for the fonts Of course, Linus Torvalds "prefers" Linux (2008) Donald Knuth, author of The Art of Computer Programming, uses Ubuntu Linux for work, and Macs for play (2008). Salvatore Sanfilippo (author of Redis) chooses to use a Mac so he can "focus on what <he's> doing without spending hours trying to fix unrelated desktop stuff" but says Linux is "still better for coding" (2009). Ted Dziuba: MacOS X is an Unsuitable Platform for Web Development (2011) Bozhidar Batsov: The Linux desktop experience is killing Linux on the desktop (2011) History: 1987: Apple GS/OS, 1994: Windows 3.1, 2000: Windows 98, 2001: Windows XP, 2007: Windows Vista, 2007: Ubuntu
Window Manager ¶	Qtile	The WM written in Python, recommended by Matt Harrison. Previously, wmii: dynamic, tiling, scriptable window manager that doesn't require a mouse. It sucks less. I switched from ratpoison at the same time I started using coLinux because running native Linux allowed me to use any Linux window manager as well. Recently, some have switched from wmii to xmonad, the new Haskell tiling window manager. It has some nice features over wmii, including dual head support, but after a brief excursion, I slightly prefer wmii's way of doing things. If you're a hard core Lisper, stumpwm is the window manager for you. It has a REPL. This one seems a little too hard core for me, especially since I don't know Lisp. History: 2007: ratpoison, 2007: wmii, 2012: Qtile
Editor/IDE ¶	GNU Emacs 24.3	Switched from Eclipse in 2007. It was a slow transition, but I think Emacs is worth the investment. On Emacs vs. Vim: I really like that Vim can be used easily on remote machines. Currently, I use TRAMP for remote file editing. It is very cool, but it is slow, and I sometimes fall back on nano (gasp) for a quick config file edit when I am logged into a remote terminal. I also suspect that the dual modes of Vim are more efficient than Emacs-- I'm just not smart enough to get used to it. Maybe someday I will switch to Viper mode. Why Emacs over Vim? I choose Emacs because it is more powerful and closer to an IDE. Emacs Lisp allows you to do whatever you want. On Emacs vs. Eclipse: I found Eclipse was too slow and heavy, used too much screen real estate, and was too difficult to customize. Scripting Eclipse required writing extensions in Java whereas Emacs can be extended in elegant Lisp. I think Eclipse (and other IDE's) are better for code exploration and they probably have features that I'm not aware of since I don't use them. I figure, though, that with enough Emacs Lisp, Emacs can do anything those IDEs can do. If you really want those features, it's a question of if you want to spend the time achieving perfection in Emacs or getting real work done in another IDE. I choose perfection. ;) Additional commentary: THE vs. Emacs, Ian Bicking (2005) Effective Emacs, Steve Yegge (2005) Other Editors?, Ian Bicking (2005) Structured Python Editor, Glyph Lefkowitz (2008) Editor Wars: Revenge of the... oh, whatever., Chalain (2008) What You Can Learn From Emacs, Jack Moffitt (2008) The Emacs Test, Glyph Lefkowitz (2008) emacs extensions i can’t live without (2009) Screencasts: I need a cool European accent What You Can Learn From ido.el YASnippet demo Install method: ~~from source~~ `sudo apt-get install emacs` Useful packages: Tramp: transparent remote file access ido: Buffer switching and more Org-Mode: note taking, task lists yaml-mode: for yaml markdown-mode: for markdown YASnippet: Textmate-inspired templating. Great for making HTML less tedious. smex: IDO for M-x multiple-cursors: this is really fun and useful. web-mode: for HTML/CSS/Javascript My emacs config on github: http://github.com/saltycrane/emacs History: 2001: Emacs/NEdit/UltraEdit, 2006: Eclipse, 2007: Emacs
Terminal ¶	urxvt 9.19 + screen	urxvt supports xft (anti-aliased) fonts, real transparency (not that I actually use transparency with Qtile), and fading (which I do use with Qtile) and it is much faster and lighter than gnome-terminal or konsole. screen allows me to switch terminal sessions without ugly tabs, attach to remote sessions, search through the scrollback buffer, and more. Update: urxvt also has embedded perl. Install method: ~~from source~~ `sudo apt-get install rxvt-unicode`
Version Control System ¶	Git	Switched from Subversion to Mercurial in June 2007. The merging in Mercurial is very nice and can be done without thinking. I do miss Subversion/Subclipse's revision history viewer, file compare, and ability to isolate files apart from changesets. Update 2010-04-05: See my comment below. Update 2011-05: Switched my personal repos to Git. Use Git almost exclusively now. Additional commentary: DVCS: Why I chose Mercurial over Git (2010) Why I Switched to Git From Mercurial (2010) Install method: `sudo apt-get install mercurial` History: 2001: SCCS, 2003: Other, 2007: Subversion, 2007: Mercurial, 2010: Mercurial/Git, 2011: Git
"High-level" Language ¶	None	Previously, I used C. Now I use don't use any statically-typed languague. Wouldn't mind learning ~~C++~~ Go. Although, Linus doesn't like it.
"High-high-level" Language ¶	Python 2.7	My love for Python is strong. I switched from Perl in 2005 and have no regrets. Object-oriented, easy to read (no more TIMTOWTDI), and smart people use it. I also want to learn Javascript 2 becuase it is the "Next Big Language" and Lisp because it is the "most powerful language". On Python vs. Ruby: from what I've read, I characterize Ruby as the more expressive language more similar to Perl (than Python is) and Python as the more regimented language. Since I like regimented, I like Python. On Python vs. Lisp: I've concluded that I lack the intelligence to harness enough of Lisp's power to counteract its non-practicality (e.g. lack of libraries). Additional commentary: Paul Graham: Python is getting closer to Lisp (2002) Paul Prescod: no it isn't Paraphrase of Gerald Jay Sussman (one of the creators of Scheme): Why MIT now uses python instead of scheme for its undergraduate CS program (2009) Install method: preinstalled on Ubuntu History: 2001: Perl, 2005: Python
Web Framework ¶	Django / Flask	2014-10:I now use Flask at work. Previously, I used Django. I think I like SQLAlchemy and Jinja2 better than Django. Django has a lot more built in and is put together better. Flask's thread locals are convenient and messy. Probably Pyramid is the best but no one uses it. Here are some links: Ian Bicking: There's so much more than Rails (2005) What PHP deployment gets right (2008) Django vs. Rails (2009) Install method: `pip install Django` / `pip install Flask`
Web Browser ¶	Firefox	Not to be confused with Konqueror, Conkeror is an emacs-like, keyboard driven, scriptable, Mozilla-based web browser. I've used it almost full time since January 2008. It is still considered alpha stage software so there are a number of bugs. However, it is still pretty sweet. I use Firefox as a backup (and IE Tab for Launchcast and Netflix on Windows). Unfortunately, one of the annoying things in Firefox 2 is present in Conkeror as well-- memory leaks. Based on this Mozilla article and some brief personal experience, Firefox 3 has made fixes in this area. It would be nice if Conkeror could benefit from the Firefox 3 fixes. Update 2010-05-04: Conkeror is really awesome, but I had some memory/cpu issues with it a while ago and haven't put in the work to merge my custom keybindings with the latest Conkeror code. Also waiting to see if there will be an Emacs version of Vimium (since, after all vimperator followed conkeror). I'm currently using a combination of Google Chrome and Firefox. Chrome is faster, especially for Javascript-intensive sites, but I really hate the Google Chrome Omnibar. If you have a solution or workaround, please post an answer. Firefox also has some Add-ons that Chrome doesn't.
Email ¶	Gmail	Update 2010-05-04: Gmail. It's slow but it has a lot of features and it's in my web browser and it's easy and it works and I don't really like using email that much anyways. Oh, and, big plus, syncing works almost perfect with my Android phone. (Actual full switch to Gmail was around December 2008.)
Graphical diff/merge ¶	KDiff3	I started using KDiff a while ago on Windows and have always liked it. I'm thinking, though, since I'm an Emacs person, I ought to use Ediff. Install method: `sudo apt-get install kdiff3`
Keyboard ¶	Unicomp Endurapro / Leopold Tenkeyless w/ Cherry MX Browns	I realize a keyboard is hardware and not software, but it is hardware that can evoke strong opinions, so I made an exception. One problem with hardware is I can't try out all the different types and choose the one I like the best as I can with software. So my selection is based on reviews not on actual experience. And, after buying the Endurapro, I found I actually did not like the integrated mouse stick, so I should have bought a Customizer 104/105 and saved $30. Update 2011-05: Got the Leopold tenkeyless for work. Tenkeyless is a must for quicker mouse access. And I actually like the lighter touch of the Cherry MX Browns versus the Unicomp's buckling springs. I like it so much I want to replace my Unicomp at home with another Leopold. Update: Got a second Leopold for home to replace my Unicomp. Additional commentary: my wrists and welcome to them., JWZ (1999) Deck keyboard, Dan (2005) Surviving Emacs - Part 4, Bill Clementson (2006) Keyboard Fight, Glyph Lefkowitz (2006) Bye, Apple, Mark Pilgrim (2006) A Meandering Review of the Logitech Illuminated Keyboard, Glyph Lefkowitz (2009) History: 2009: Unicomp Endurapro, 2011: Leopold Tenkeyless (MX Browns)
Office Chair ¶	Undecided	Herman Miller Aeron or Steelcase Leap
Mobile Phone ¶	Motorola Droid 4	After almost 10 years (and 3 generations) of the Palm Treo, I got the Motorola Droid. The Droid is pretty awesome, but I don't like switching to landscape mode to use the keyboard-- especially when toolbars and headers take up half the screen. I think Palm and Blackberry got this form factor Right. Just as I prefer the keyboard over the mouse on my PC, I want to use the keyboard and D-pad over the touchscreen on my phone. Update 2010-05-04: The keyboard is mediocre and use of the D-pad center button sucks. My Palm Treo 650 D-pad worked much better. My reasons for choosing an Android phone are: I need a physical keyboard. I want to hack on my phone. I think Android has more potential than Palm's webOS or Maemo. Useful apps: Connectbot Secure shell (SSH) client for the Android platform. Use this all the time. Awesome. Listen Listen from Google Labs brings podcast search, subscribe, download and stream to your Android-powered device. My Tracks Record GPS tracks. Monitor your performance. Share your outdoor activities with friends. Really enjoy this one. Love tracking my romping. 2010-05-04 Updates: Google Voice: Free text messages and better voicemail. Another useful app. Makes voicemail suck a lot less. FoxyRing: automatically control the ringer volume based on ambient noise level. I was doubtful about this one, but it actually works well for me. Flixter (Movies): Useful for checking movie showtimes. Bible (YouVersion): This version is pretty good. I miss my Plucker created Bible on my Palm though. I wish there was something like Plucker for Android. DroidLight: use the LED camera flash as a flashlight. Simple and useful. Works much better than using the screen as a flashlight. Google Navigation: I don't need my Garmin anymore. (Each has advantages and disadvantages, but I have found it to be a capable replacement.) Just want to mention that the Calendar sucks. It is great that I can sync with my Google calendar, but I really hate the Android Calendar UI. Additional Commentary/Links: Thoughts on the Palm Pre, G2, and iPhone 3G Leonard Lin (2009 June) JWZ gets a Palm Pre (2009 June) Emacs on Android (2009 June) The Android Opportunity, John Gruber (2009 August) Dear Palm, it's just not working out, JWZ (2009 October) History: 2002: Handspring Treo 180, 2003: Handspring Treo 600, 2005: Palm Treo 650, 2009: Motorola Droid 2011: Motorola Droid 4

More PyQt example code

2007-06-21T18:10:00-07:00

Here is some more PyQt example code.

################################################################
def main():
    app = QApplication(sys.argv)
    mw = MainWindow()
    sys.exit(app.exec_())

################################################################
class MainWindow(QMainWindow):
    def __init__(self):
        QMainWindow.__init__(self)

        # create stuff
        self.rw = ReportWidget()
        self.setCentralWidget(self.rw)
        self.sw = StartWindow()
        self.createActions()
        self.createMenus()
        self.createStatusBar()

        # create progress bar
        self.pb = QProgressBar(self.statusBar())
        self.statusBar().addPermanentWidget(self.pb)

        # connections
        self.connect(self.sw, SIGNAL("okClicked"),
                    self.rw.create)
        self.connect(self.rw.table, SIGNAL("progressChanged"),
                     self.update_progress)
        self.connect(self.rw.table, SIGNAL("displayFinished"),
                     self.hide_progress_bar)

        # format the main window
        self.setGeometry(100,100,750,550)

        # show windows
        self.show()
        self.sw.show()

    def update_progress(self, n, nrows):
        self.pb.show()
        self.pb.setRange(0, nrows)
        self.pb.setValue(n)
        self.statusBar().showMessage(self.tr("Parsing eventlog data..."))

    def hide_progress_bar(self):
        self.pb.hide()
        self.statusBar().showMessage(self.tr("Finished"))

    def about(self):
        QMessageBox.about(self, self.tr("About AIS Audit Tool"),
            self.tr("AIS Audit Tool\n\n"
                    "%s\n"
                    "%s\n"
                    "%s" % (__author__, __version__, __date__)))

    def createActions(self):
        self.exitAct = QAction(self.tr("E&xit;"), self)
        self.exitAct.setShortcut(self.tr("Ctrl+Q"))
        self.exitAct.setStatusTip(self.tr("Exit the application"))
        self.connect(self.exitAct, SIGNAL("triggered()"), self, SLOT("close()"))

        self.aboutAct = QAction(self.tr("&About;"), self)
        self.aboutAct.setStatusTip(self.tr("Show the application's About box"))
        self.connect(self.aboutAct, SIGNAL("triggered()"), self.about)

        self.aboutQtAct = QAction(self.tr("About &Qt;"), self)
        self.aboutQtAct.setStatusTip(self.tr("Show the Qt library's About box"))
        self.connect(self.aboutQtAct, SIGNAL("triggered()"), qApp, SLOT("aboutQt()"))

    def createMenus(self):
        self.fileMenu = self.menuBar().addMenu(self.tr("&File;"))
        self.fileMenu.addAction(self.exitAct)

        self.helpMenu = self.menuBar().addMenu(self.tr("&Help;"))
        self.helpMenu.addAction(self.aboutAct)
        self.helpMenu.addAction(self.aboutQtAct)

    def createStatusBar(self):
        sb = QStatusBar()
        sb.setFixedHeight(18)
        self.setStatusBar(sb)
        self.statusBar().showMessage(self.tr("Ready"))

################################################################
class StartWindow(QWidget):
    def __init__(self, *args):
        QWidget.__init__(self, *args)

        # date box
        self.label_date = QLabel()
        self.label_date.setText("Set date of last audit:")
        default = datetime.date.today() - datetime.timedelta(DEFAULT_DAYS_FROM_LAST_AUDIT)
        self.datebox = QDateEdit(QDate(default.year, default.month, default.day))

        # buttons
        spacer = QSpacerItem(20,40,QSizePolicy.Minimum,QSizePolicy.Expanding)
        self.button_ok = QPushButton()
        self.button_ok.setText("OK")
        self.button_ok.setDefault(True)
        button_cancel = QPushButton()
        button_cancel.setText("Cancel")

        # layout
        layout_right = QVBoxLayout(self)
        layout_right.addWidget(self.label_date)
        layout_right.addWidget(self.datebox)
        layout_right.addItem(spacer)
        layout_right.addWidget(self.button_ok)
        layout_right.addWidget(button_cancel)

        # connections
        self.connect(button_cancel, SIGNAL("clicked(bool)"),
                    self.close)
        self.connect(self.button_ok, SIGNAL("clicked(bool)"),
                    self.ok_clicked)

    def ok_clicked(self):
        self.close()
        year = self.datebox.date().year()
        month = self.datebox.date().month()
        day = self.datebox.date().day()
        dateobj = datetime.date(int(year),int(month),int(day))
        self.emit(SIGNAL("okClicked"), dateobj)

################################################################
class ReportWidget(QWidget):
    def __init__(self, *args):
        QWidget.__init__(self, *args)
        self.cbUsers = QCheckBox("Hide SYSTEM users")
        self.cbSorting = QCheckBox("Sorting enabled")
        self.table = MyTable()
        self.textbrowser = QTextBrowser()
        self.textbrowser.setFontFamily("Courier")
        self.textbrowser.setFontPointSize(10)
        hlayout = QHBoxLayout()
        hlayout.addWidget(self.cbUsers)
        hlayout.addWidget(self.cbSorting)
        vlayout = QVBoxLayout()
        vlayout.setMargin(2)
        vlayout.addLayout(hlayout)
        vlayout.addWidget(self.table)
        self.setLayout(vlayout)
        self.setGeometry(100,100,750,550)

        # connections
        self.connect(self.cbUsers, SIGNAL("stateChanged(int)"),
                     self.cbUsersChanged)
        self.connect(self.cbSorting, SIGNAL("stateChanged(int)"),
                     self.cbSortingChanged)

    def create(self, dateobj):
        """ Parses the eventlog data, displays it in a table, and
            displays the user login/logout also """
        self.table.display_data(dateobj)

    def cbUsersChanged(self):
        state = self.cbUsers.checkState()
        if state == 0:
            self.table.show_system_users()
        elif state == 2:
            self.table.hide_system_users()

    def cbSortingChanged(self):
        state = self.cbSorting.checkState()
        if state == 0:
            self.table.setSortingEnabled(False)
        elif state == 2:
            self.table.setSortingEnabled(True)

################################################################
class MyTable(QTableWidget):
    """ Creates a custom table widget """
    def __init__(self, *args):
        QTableWidget.__init__(self, *args)
        self.setSelectionMode(self.ContiguousSelection)
        self.setGeometry(0,0,700,400)
        self.setShowGrid(False)
        self.other_users_list = []

    def hide_system_users(self):
        for n in self.other_users_list:
            self.setRowHidden(n, True)

    def show_system_users(self):
        for n in self.other_users_list:
            self.setRowHidden(n, False)

    def display_data(self, dateobj):
        """ Reads in data as a 2D list and formats and displays it in
            the table """

        print "Fetching data..."
        ep = EventlogParser()
        data = ep.parse_log(dateobj)
        print "Done."

        if len(data)==0:
            data = ["No data for this date range."]

        nrows = len(data)
        ncols = len(data[0])
        self.setRowCount(nrows)
        self.setColumnCount(ncols)
        self.setHorizontalHeaderLabels(['No.', 'Date','Time','Type','Event','User','Computer'])

        for i in xrange(len(data)):
            # update progress dialog
            if (i%20) == 0:
                self.emit(SIGNAL("progressChanged"), i, nrows)
                qApp.processEvents()

            # set each cell to be a QTableWidgetItem from the _process_row method
            items = self._process_row(data[i])
            for j in range(len(items)):
                self.setItem(i, j, items[j])
            self.setRowHeight(i, 16)

            # set column width first time through
            if i == 0:
                self.resizeColumnsToContents()
                self.setColumnWidth(4, 250)

        # format column width
        self.resizeColumnsToContents()
        self.setColumnWidth(4, 250)

        # emit signal for finished processing
        self.emit(SIGNAL("displayFinished"))

    def _process_row(self, row):
        """ Formats items in the row of the 2-D list data
            Input: the row of data from the EventlogParser in a list
            Returns a list of QTableWidgetItems to be one row in the table
        """

        icon = []
        for i in xrange(len(row)):
            # general formatting for all cells (may be overwritten)
            icon.append(QIcon())
            computer = row[6]

            # time processing
            if i == 2:
                try:
                    hour = int(re.split(r":", row[i])[0])
                except:
                    raise
                if hour <= EARLY_HOUR or hour >= LATE_HOUR:
                    backcolor_time = QColor(0,0,102)
                else:
                    backcolor_time = QColor("white")

            # success or failure processing
            elif i == 3:
                if row[i] == "8":
                    row[i] = "Success"
                    icon[i] = QIcon("success.png")
                elif row[i] == "16":
                    row[i] = "Failure"
                    icon[i] = QIcon("failure.png")
                else:
                    row[i] = "Unknown"
                    icon[i] = QIcon("unknown.png")

            # event processing
            elif i == 4:
                backcolor = QColor("white")
                if row[i] in RED_EVENTIDS:
                    backcolor = QColor("red")
                elif row[i] in ORANGE_EVENTIDS:
                    backcolor = QColor("orange")
                elif row[i] in YELLOW_EVENTIDS:
                    backcolor = QColor("yellow")
                elif row[i] in GREEN_EVENTIDS:
                    pass
                elif row[i] in OTHER_EVENTIDS:
                    backcolor = QColor("blue")
                try:
                    row[i] = row[i] + ": " + EVENT_DESC[row[i]]
                except:
                    pass

            # user processing
            elif i == 5:
                if row[i] in (computer, "", "SYSTEM", "NETWORK SERVICE", "LOCAL SERVICE", "ANONYMOUS LOGON"):
                    font = QFont("Arial", 8)
                    font.setBold(False)
                    textcolor = QColor("gray")
                    user = 'other'
                else:
                    font = QFont("Arial", 8)
                    font.setBold(True)
                    textcolor = QColor("black")
                    user = 'user'

        # create table widget item
        tableitem_list = []
        for i in xrange(len(row)):
            tableitem = QTableWidgetItem(row[i])
            if i == 2:
                tableitem.setBackgroundColor(backcolor_time)
            else:
                tableitem.setBackgroundColor(backcolor)
            tableitem.setTextColor(textcolor)
            tableitem.setFont(font)
            tableitem.setTextAlignment(Qt.AlignTop)
            tableitem.setToolTip(row[i])
            tableitem.setIcon(icon[i])
            tableitem_list.append(tableitem)

        return tableitem_list

################################################################
if __name__ == "__main__":
    main()

How to get code completion for PyQt using Pydev

2007-06-14T12:50:00-07:00

Because PyQt doesn't come with the .py source files which Pydev needs for code completion, you need to add the PyQt modules as a "forced builtin lib". See the following FAQs at the pydev website: http://pydev.sourceforge.net/faq.html#ref_22 and http://pydev.sourceforge.net/faq.html#ref_15

Here are the steps:

Go to "Window" -> "Preferences..." -> "Pydev" -> "Interpreter - Python"
In the "Forced builtin libs" section, click "New..."
Type in "PyQt4" and click "OK".
Click "OK" to close the Preferences window.

Note: If you installed PyQt after installing Pydev, you will probably have to update the PYTHONPATH with the path to the PyQt libraries. You can do this easily by "Remove"ing the python interpreter and then re-adding it in again. (For reference, I am using Eclipse 3.2.2, Pydev 1.3.4, Python 2.5.1, and PyQt 4.2.3)

"Pydev code completion: rebuilding modules" error

2007-06-14T12:39:00-07:00

This error took me a while to find a solution. I got the following error message when I tried to change the python interpreter in pydev: "An internal error occurred during: "Pydev code completion: rebuilding modules"". I was using Eclipse 3.3 RC4, Python 2.5.1, and had just updated to Pydev 1.3.4. I thought maybe there were incompatabilities between the versions so I tried many different combinations of versions going back to Eclipse 3.2 and Python 2.4. I even tried changing the JVM to the BEA JRockit version after the Pydev author mentioned it regarding another problem. (I may still go back to this because it is supposed to be faster.) Then I checked if my other plugins were causing a problem. I finally discovered that I won't get this error if I close my Pydev project. I'm not sure if this a Pydev bug or not.

More example code for PyQt 4.2

2007-06-12T17:40:00-07:00

Here is some more example code for PyQt 4.2.

################################################################
################################################################
def main():
    app = QApplication(sys.argv)
    app.setQuitOnLastWindowClosed(True)
    form = StartForm()
    form.show()
    report = ReportWindow()
    app.connect(form, SIGNAL("okClicked"),
                report.create)
    sys.exit(app.exec_())

################################################################
################################################################
class StartForm(QWidget):
    def __init__(self, *args):
        QWidget.__init__(self, *args)

        # position the form on the screen
        self.move(1500, 50)

        # date box
        self.label_date = QLabel()
        self.label_date.setText("Set date of last audit:")
        default = datetime.date.today() - datetime.timedelta(DEFAULT_DAYS_FROM_LAST_AUDIT)
        self.datebox = QDateEdit(QDate(default.year, default.month, default.day))

        # buttons
        spacer = QSpacerItem(20,40,QSizePolicy.Minimum,QSizePolicy.Expanding)
        self.button_ok = QPushButton()
        self.button_ok.setText("OK")
        self.button_ok.setDefault(True)
        button_cancel = QPushButton()
        button_cancel.setText("Cancel")

        # layout
        layout_right = QVBoxLayout(self)
        layout_right.addWidget(self.label_date)
        layout_right.addWidget(self.datebox)
        layout_right.addItem(spacer)
        layout_right.addWidget(self.button_ok)
        layout_right.addWidget(button_cancel)

        # connections
        self.connect(button_cancel, SIGNAL("clicked(bool)"),
                    self.close)
        self.connect(self.button_ok, SIGNAL("clicked(bool)"),
                    self.ok_clicked)

    def ok_clicked(self):
        self.label_date.setText("Getting eventlog data...")
        year = self.datebox.date().year()
        month = self.datebox.date().month()
        day = self.datebox.date().day()
        delta = datetime.date.today() - datetime.date(int(year),int(month),int(day))
        self.emit(SIGNAL("okClicked"), delta.days)
        self.close()

################################################################
################################################################
class ReportWindow(QWidget):
    def __init__(self, *args):
        QWidget.__init__(self, *args)
        self.cbUsers = QCheckBox("Hide SYSTEM users")
        self.cbSorting = QCheckBox("Sorting enabled")
        self.tableview = EventlogTableView()

        vlayout = QVBoxLayout()
        vlayout.addWidget(self.cbUsers)
        vlayout.addWidget(self.cbSorting)
        vlayout.addWidget(self.tableview)
        self.setLayout(vlayout)
        self.setGeometry(100,100,750,550)

        # connections
        self.connect(self.cbUsers, SIGNAL("stateChanged(int)"),
                     self.cbUsersChanged)
        self.connect(self.cbSorting, SIGNAL("stateChanged(int)"),
                     self.cbSortingChanged)

    def create(self, ndays):
        """ Run dumpel.exe, parse and show the results in a table """
        ep = EventlogParser()
        logdata = ep.parseLog(ndays)
        model = EventlogTableModel(logdata, self)
        self.tableview.setModel(model)
        self.tableview.formatData()
        self.show()

    def cbUsersChanged(self):
        state = self.cbUsers.checkState()
        print "state= " + str(state)
        if state == 0:
            self.table.show_system_users()
        elif state == 2:
            self.table.hide_system_users()

    def cbSortingChanged(self):
        state = self.cbSorting.checkState()
        if state == 0:
            self.table.setSortingEnabled(False)
        elif state == 2:
            self.table.setSortingEnabled(True)

################################################################
################################################################
class EventlogTableView(QTableView):
    def __init__(self, *args):
        QTableView.__init__(self, *args)

    def formatData(self):
        """ Formats the data in the table view """
        self.resizeColumnsToContents()

        # format items
        model = self.model()
        self.nrows = model.rowCount(self)
        self.ncols = model.columnCount(self)
        for i in xrange(self.nrows):
            for j in xrange(self.ncols):
                index = model.index(i, j)

################################################################
################################################################
class EventlogTableModel(QAbstractTableModel):
    def __init__(self, datain, parent=None):
        QAbstractTableModel.__init__(self, parent)
        self.logdata = datain

    def rowCount(self, parent):
        return len(self.logdata)

    def columnCount(self, parent):
        return len(self.logdata[0])

    def data(self, index, role):
        if not index.isValid():
            return QVariant()
        elif role != Qt.DisplayRole:
            return QVariant()
        return QVariant(self.logdata[index.row()][index.column()])

################################################################
################################################################
class EventlogParser:
    def __init__(self):
        pass

    def parseLog(self, ndays):
        """ Runs dumpel.exe and stores the results in a data structure
            Example dumpel output:
                3/14/2006    1:00:51 PM    8    1    515    Security    NT AUTHORITY\SYSTEM        ANC9PLT561    Win
            Returns a 2-dimensional list
        """
        os.system("dumpel -f dumpel_results.txt -l security -d %s" % ndays)
        fin = open("dumpel_results.txt", "r")
        data = []
        for line in fin.readlines():
            items = re.split(r'\t+', line, maxsplit=8)

            # date processing
            items[0] = format_datetime(items[0], items[1])

            # success/failure processing
            if items[2] == "8":
                items[2] = "Success"
            elif items[2] == "16":
                items[2] = "Failure"
            else:
                items[2] = "Unknown"

            # event processing
            event = items[4]
            try:
                desc = event_desc[event]
            except:
                desc = 'need to look up this event id'
            items[4] = event + ": " + desc

            # user processing
            (dom, user) = re.split(r"\\", items[6])
            items[6] = user

            # delete unused columns
            items[7:8] = []
            items[5:6] = []
            items[3:4] = []

            data.append(items)
        print "Parse sucessful.\n"
        return data

PyQt 4.2 QAbstractTableModel/QTableView Example

2007-06-11T09:39:00-07:00

from PyQt4.QtCore import *
from PyQt4.QtGui import *
import sys

my_array = [['00','01','02'],
            ['10','11','12'],
            ['20','21','22']]

def main():
    app = QApplication(sys.argv)
    w = MyWindow()
    w.show()
    sys.exit(app.exec_())

class MyWindow(QWidget):
    def __init__(self, *args):
        QWidget.__init__(self, *args)

        tablemodel = MyTableModel(my_array, self)
        tableview = QTableView()
        tableview.setModel(tablemodel)

        layout = QVBoxLayout(self)
        layout.addWidget(tableview)
        self.setLayout(layout)

class MyTableModel(QAbstractTableModel):
    def __init__(self, datain, parent=None, *args):
        QAbstractTableModel.__init__(self, parent, *args)
        self.arraydata = datain

    def rowCount(self, parent):
        return len(self.arraydata)

    def columnCount(self, parent):
        return len(self.arraydata[0])

    def data(self, index, role):
        if not index.isValid():
            return QVariant()
        elif role != Qt.DisplayRole:
            return QVariant()
        return QVariant(self.arraydata[index.row()][index.column()])

if __name__ == "__main__":
    main()

Python os.walk example

2007-03-15T11:37:00-07:00

Here is a simple os.walk() example which walks your directory tree and returns the path, a list of directories, and a list of files:

import os

path = "c:\\python25"

i = 0
for (path, dirs, files) in os.walk(path):
    print path
    print dirs
    print files
    print "----"
    i += 1
    if i >= 4:
        break

Here are the results:

c:/python25
['DLLs', 'Doc', 'include', 'Lib', 'libs', 'tcl', 'Tools']
['LICENSE.txt', 'NEWS.txt', 'pylupdate4.exe', 'pyrcc4.exe', 'python.exe', 'pythonw.exe', 'pyuic4.bat', 'README.txt', 'temp.py', 'w9xpopen.exe']
----
c:/python25\DLLs
[]
['bz2.pyd', 'py.ico', 'pyc.ico', 'pyexpat.pyd', 'select.pyd', 'sqlite3.dll', 'tcl84.dll', 'tclpip84.dll', 'tk84.dll', 'unicodedata.pyd', 'winsound.pyd', '_bsddb.pyd', '_ctypes.pyd', '_ctypes_test.pyd', '_elementtree.pyd', '_hashlib.pyd', '_msi.pyd', '_socket.pyd', '_sqlite3.pyd', '_ssl.pyd', '_testcapi.pyd', '_tkinter.pyd']
----
c:/python25\Doc
[]
['Python25.chm']
----
c:/python25\include
[]
['abstract.h', 'asdl.h', 'ast.h', 'bitset.h', 'boolobject.h', 'bufferobject.h', 'cellobject.h', 'ceval.h', 'classobject.h', 'cobject.h', 'code.h', 'codecs.h', 'compile.h', 'complexobject.h', 'cStringIO.h', 'datetime.h', 'descrobject.h', 'dictobject.h', 'enumobject.h', 'errcode.h', 'eval.h', 'fileobject.h', 'floatobject.h', 'frameobject.h', 'funcobject.h', 'genobject.h', 'graminit.h', 'grammar.h', 'import.h', 'intobject.h', 'intrcheck.h', 'iterobject.h', 'listobject.h', 'longintrepr.h', 'longobject.h', 'marshal.h', 'metagrammar.h', 'methodobject.h', 'modsupport.h', 'moduleobject.h', 'node.h', 'object.h', 'objimpl.h', 'opcode.h', 'osdefs.h', 'parsetok.h', 'patchlevel.h', 'pgen.h', 'pgenheaders.h', 'pyarena.h', 'pyconfig.h', 'pydebug.h', 'pyerrors.h', 'pyexpat.h', 'pyfpe.h', 'pygetopt.h', 'pymactoolbox.h', 'pymem.h', 'pyport.h', 'pystate.h', 'pystrtod.h', 'Python-ast.h', 'Python.h', 'pythonrun.h', 'pythread.h', 'py_curses.h', 'rangeobject.h', 'setobject.h', 'sliceobject.h', 'stringobject.h', 'structmember.h', 'structseq.h', 'symtable.h', 'sysmodule.h', 'timefuncs.h', 'token.h', 'traceback.h', 'tupleobject.h', 'ucnhash.h', 'unicodeobject.h', 'weakrefobject.h']
----

How to use python and popen4 to capture stdout and stderr from a command

2007-03-12T15:10:00-07:00

You can use popen to capture stdout from a command:

import os
stdout = os.popen("dir asdkfhqweiory")
print stdout.read()

And your output will be something like:

>>> ================================ RESTART ================================
 >>> 
  Volume in drive C has no label.
  Volume Serial Number is XXXXXXXX
 
  Directory of C:\Python25
 
 
 >>>

If you wanted the error message, popen won't give it to you. To capture both stdout and stderr, use popen4:

import os

(dummy, stdout_and_stderr) = os.popen4("dir asdkfhqweiory")
print stdout_and_stderr.read()

This will give you the following output (which includes the error message):

>>> ================================ RESTART ================================
 >>> 
  Volume in drive C has no label.
  Volume Serial Number is XXXXXXXX
 
  Directory of C:\Python25
 
 File Not Found
 
 >>>

See http://docs.python.org/lib/os-newstreams.html for more information.

Eclipse/Pydev "Execption occurred executing command line" error

2007-01-30T11:12:00-08:00

Eclipse 3.2, PyDev 1.2.5 error:
Execption occurred executing command line.
There is a problem with your Python interpreter. To fix, go to "Window", "Preferences", "Pydev", "Interpreter - Python", and make sure your python interpreter is set up correctly.

PyQt4 Examples

2007-01-25T17:54:00-08:00

PyQt is the best GUI toolkit for Python. Here are some examples and notes I am in the process of collecting.

There is a good collection of tutorials and examples included with the PyQt 4 installation. On Windows, go to "Start" -> "All Programs" -> "PyQt GPL v4.x.x" -> "Examples Source" for many tutorials and examples ported from C++/QT to Python/PyQt.

Update 2008-09-28: For Ubuntu Linux, the examples are part of the python-qt4-doc package. Execute sudo apt-get install python-qt4-doc then navigate to /usr/share/doc/python-qt4-doc/examples.

How to install PyQt 4.3 and Python 2.5 on Windows

2007-01-25T10:35:00-08:00

Update 1/10/2008: PyQt has made the install process a lot simpler because it has bundled everything you need in one installer including QT 4.3 open source edition. Now all you need to do is install Python and the PyQt bundle. Immediately following are the updated steps. Below that is the old instructions.

Update 7/1/2008: Updated for PyQt 4.4.2

NEW INSTRUCTIONS

Here are the steps to install and create a simple "Hello World" GUI application using PyQt 4.4, and Python 2.5, on Windows.

Install Python 2.5

Go to http://www.python.org/download/ and click on "Python 2.5.x Windows installer"
Save and run the Windows installer
Go through the steps and accept the defaults.

Install the PyQt 4.4 bundle (including QT 4.4)

Go to http://www.riverbankcomputing.co.uk/software/pyqt/download and select the "PyQt-Py2.5-gpl-4.4.2-1.exe" link.
Save and run the file.
Go through the steps and accept the defaults.

Run a "Hello World" application

Go to "Start" -> "All Programs" -> "Python 2.5" -> "IDLE (Python GUI)"
Open a new window ("File" -> "New window")

Type the following code inside and save:

import sys
from PyQt4.QtGui import *
app = QApplication(sys.argv)
button = QPushButton("Hello World", None)
button.show()
app.exec_()

Hit "F5" to run. A window with a single push button should pop up.

For more examples, go to "Start" -> "All Programs" -> "PyQt GPL v4.4.2 for Python v2.5" > "Examples" > "PyQt Examples Source" (For a default installation, this is also located at C:\Python25\PyQt4\examples.) To start, look in the "tutorial" directory.

OLD INSTRUCTIONS

Here are the steps to install and create a simple "Hello World" GUI application using PyQt 4.1.1, Python 2.5, and QT 4.2.2 Open Source edition (GPL) on Windows XP with the MinGW compiler.

Install Python 2.5

Go to http://www.python.org/download/ and click on "Python 2.5 Windows installer"
Save and run the Windows installer
Go through the steps and accept the defaults.

Install MinGW

Go to http://www.mingw.org/download.shtml
Download the following "bin" files from the "Current" section:
- gcc-core-3.4.2-20040916-1.tar.gz
- gcc-g++-3.4.2-20040916-1.tar.gz
- mingw-runtime-3.9.tar.gz
- w32api-3.6.tar.gz
Extract all the files to "c:\mingw"

Install QT 4.2.2 Open Source edition

Go to the Open Source download page at http://www.trolltech.com/developer/downloads/qt/windows. Note there is also an Evaluation version. This is *not* the one you want.
Under the "Download" heading, select the "http://ftp.iasi.roedu.net/mirrors/ftp.trolltech.com/qt/source/qt-win-opensource-4.2.2-mingw.exe" link.
Go through the steps and accept the defaults.
When you get to the MinGW page, leave the "Download and install minimal MinGW installation" box unchecked and make sure the location of the MinGW installation is set to "c:\mingw". Click "Install".
You will get an error message which says that the installer could not find a valid "w32api.h" file. You can install the 3.2 version from the mingw site, but the 3.6 version works. Click "Yes" to continue. Click "Finish" to finish the installation.

Install PyQt 4.1.1

Go to http://www.riverbankcomputing.co.uk/pyqt/download.php and select the "PyQt-gpl-4.1.1-Py2.5-Qt4.2.2.exe" link.
Save and run the file.
Go through the steps and accept the defaults.

Check your Environment Variables

Right-click on "My Computer" and select "Properties"
Click the "Advanced" tab
Click "Environment Variables"
The following variables should be set:
- user variable QTDIR - "c:\qt\4.2.2"
- user variable QMAKESPEC - "win32-g++"
- system variable PATH - include "C:\Qt\4.2.2\bin;C:\Python25\Scripts;C:\Python25;C:\Python25\DLLs;"

Run a "Hello World" application

Go to "Start" -> "All Programs" -> "Python 2.5" -> "IDLE (Python GUI)"
Open a new window ("File" -> "New window")

Type the following code inside and save:

import sys
from PyQt4.QtGui import *
app = QApplication(sys.argv)
button = QPushButton("Hello World", None)
button.show()
app.exec_()

Hit "F5" to run.

Technorati tags: python, pyqt

How to install Easy Install for Python

2007-01-10T17:50:00-08:00

Update 2009-03-31:

For current installation instructions on Windows and Cygwin, see http://pypi.python.org/pypi/setuptools.
For a better alternative to Easy Install, see pip by Ian Bicking

How to install Easy Install on Ubuntu Linux

$ sudo apt-get install python-setuptools python-dev build-essential

How to install Easy Install on Windows

Go to http://peak.telecommunity.com/DevCenter/EasyInstall#installing-easy-install
Right click on "ez_setup.py" and save the file to "c:\temp"
Open a cmd.exe prompt
"cd c:\temp"
"python ez_setup.py"

How to install Easy Install on Cygwin

$ cd /tmp
$ wget http://peak.telecommunity.com/dist/ez_setup.py
$ python ez_setup.py

How to draw a simple line using python and the matplotlib API -

2007-01-05T18:08:00-08:00

I'm continuing to learn the low level object oriented matplotlib API. My goal is to create very customizable, perfect plots. Here is how to draw a simple line. First create a figure that is 4 inches by 4 inches. Then create some axes with a 10% margin around each edge. Then add the axes to the figure. Then create a line from (0,0) to (1,1). Then add the line to the axes. Then create a canvase. Then create the .png file. Looks like good object oriented python fun to me...

""" line_ex.py                                             
 """                                                        
 from matplotlib.figure import Figure                       
 from matplotlib.axes import Axes                           
 from matplotlib.lines import Line2D                        
 from matplotlib.backends.backend_agg import FigureCanvasAgg
 
 fig = Figure(figsize=[4,4])                                
 ax = Axes(fig, [.1,.1,.8,.8])                              
 fig.add_axes(ax)                                           
 l = Line2D([0,1],[0,1])                                    
 ax.add_line(l)                                             
 
 canvas = FigureCanvasAgg(fig)                              
 canvas.print_figure("line_ex.png")

How to use the pylab API vs. the matplotlib API

2007-01-04T16:59:00-08:00

This article has a good description of the 2 API's in matplotlib: the pylab API and the matplotlib API. I've been using the pylab interface because it is easier, especially coming from a matlab background. But I wanted to get direct access to the matplotlib classes so I needed to use the matplotlib API. Here is a simple example that creates a .png figure using the 2 different API's.

Here is the example using the pylab API:

""" api_pylab.py          
 """                       
 from pylab import *       
 
 figure(figsize=[4,4])     
 axes([.1,.1,.8,.8])       
 scatter([1,2],[3,4])      
 savefig('api_pylab.png')

Here is the example using the matplotlib API:

""" api_matplotlib.py                                        
 """                                                          
 from matplotlib.figure import Figure                         
 from matplotlib.backends.backend_agg import FigureCanvasAgg  
 
 fig = Figure(figsize=[4,4])                                  
 ax = fig.add_axes([.1,.1,.8,.8])                             
 ax.scatter([1,2], [3,4])                                     
 canvas = FigureCanvasAgg(fig)                                
 canvas.print_figure("api_matplotlib.png")

How to create some derived arrow classes with matplotlib and python

2007-01-03T10:41:00-08:00

Here is an example of how to create some derived arrow classes with matplotlib and python. The arrow() function in matplotlib accepts origin and delta x and delta y inputs. I changed this to polor coordinates so Arrow2 accepts the x and y coordinates of the origin, the length, and the angle. Then I created 4 classes derived from Arrow2 called ArrowRight, ArrowLeft, ArrowUp, and ArrowDown. These just set the angle for you to 0, 180, 90, and 270 respectively. Notice too that the **kwargs can be passed down so you can still set all the other parameters.

""" arrow_ex2.py """ 
from pylab import *  

def main(): 
    figure() 
    axes() 
    Arrow2(.5,.5,.2,45) 
    ArrowRight(.5,.5,.2) 
    ArrowLeft(.5,.5,.2) 
    ArrowUp(.5,.5,.2) 
    ArrowDown(.5,.5,.2) 
    show() 

class Arrow2: 
    def __init__(self, x0, y0, length, angle=0.0, color='k', width=0.01, **kwargs):          
        dx = length*cos(angle*pi/180) 
        dy = length*sin(angle*pi/180) 
        arrow (x0, y0, dx, dy, 
               width=width, 
               edgecolor=color,  
               facecolor=color,  
               antialiased=True,  
               head_width=5*width, 
               head_length=7.5*width, 
               **kwargs)

class ArrowRight(Arrow2): 
    def __init__(self, x0, y0, length, **kwargs): 
        Arrow2.__init__(self, x0, y0, length, angle=0.0, **kwargs)

class ArrowLeft(Arrow2): 
    def __init__(self, x0, y0, length, **kwargs): 
        Arrow2.__init__(self, x0, y0, length, angle=180.0, **kwargs)                      

class ArrowUp(Arrow2): 
    def __init__(self, x0, y0, length, **kwargs): 
        Arrow2.__init__(self, x0, y0, length, angle=90.0, **kwargs) 

class ArrowDown(Arrow2): 
    def __init__(self, x0, y0, length, **kwargs): 
        Arrow2.__init__(self, x0, y0, length, angle=270.0, **kwargs) 

if __name__ == "__main__": 
    main()

How to draw an arrow with matplotlib and python

2007-01-02T14:21:00-08:00

Here is an example of how to draw an arrow with matplotlib. It should be
very easy, but I had to change the width setting so the arrow head would
not be too small. Further documentation is here:
http://matplotlib.sourceforge.net/matplotlib.pylab.html#-arrow

""" arrow_ex.py
"""
from pylab import *
figure()
axes()
arrow(.1,.1,.2,.2, width=0.01)
show()

Example pie charts using python and matplotlib

2006-12-19T10:33:00-08:00

I needed to make some pie charts and didn't like the results I got from Excel. It was too hard to customize the plots exactly the way I wanted them. I have used Matlab before and I preferred Matlab to Excel. However, Python is my favorite thing to use so I searched for python and matlab on Google and found matplotlib. Matplotlib is a matlab-like plotting library for Python. You can get matplotlib from http://matplotlib.sourceforge.net/, but it is also bundled with the Enthought version of Python so I got it from there. Update: I realized that the Enthought bundle didn't include the latest version of matplotlib so I installed the latest version of matplotlib and the required NumPy as well.

Step-by-step:
1. Download enthought version of Python 2.4.3 from http://code.enthought.com/enthon/ (Click on the "enthon-python2.4-1.0.0.exe" link is at the bottom of the page) and install it.
2. Download "numpy-1.0.1.win32-py2.4.exe" from http://sourceforge.net/project/showfiles.php?group_id=1369 and install it.
3. Download "matplotlib-0.87.7.win32-py2.4.exe" from http://sourceforge.net/projects/matplotlib and install it.
3. Open a text editor and type this inside:

#!/usr/bin/env python
"""
http://matplotlib.sf.net/matplotlib.pylab.html#-pie for the docstring.
"""
from pylab import *

# create figure
figwidth = 10.0    # inches
figheight = 3.5   # inches
figure(1, figsize=(figwidth, figheight))
rcParams['font.size'] = 12.0
rcParams['axes.titlesize'] = 16.0
rcParams['xtick.labelsize'] = 12.0
rcParams['legend.fontsize'] = 12.0
explode=(0.05, 0.0)
colors=('b','g')
Ncols = 3
plotheight = figwidth/Ncols
H = plotheight/figheight
W = 1.0 / Ncols
margin = 0.1
left = [W*margin, W*(1+margin), W*(2+margin)]
bottom = H*margin
width = W*(1-2*margin)
height = H*(1-2*margin)

# cpu utilization
utilized = 10.0
free = 100.0 - utilized
fracs = [utilized, free]
axes([left[0], bottom, width, height])
patches = pie(fracs, colors=colors, explode=explode, autopct='%1.f%%', shadow=True)
title('CPU Throughput')
legend((patches[0], patches[2]), ('Processing', 'Idle'), loc=(0,-.05))

# ROM utilization
utilized = 30.0
free = 100.0 - utilized
fracs = [utilized, free]
axes([left[1], bottom, width, height])
patches = pie(fracs, colors=colors, explode=explode, autopct='%1.f%%', shadow=True)
title('ROM Memory Usage')
legend((patches[0], patches[2]), ('Used', 'Unused'), loc=(0,-.05))

# RAM utilization
utilized = 15.0
free = 100.0 - utilized
fracs = [utilized, free]
axes([left[2], bottom, width, height])
patches = pie(fracs, colors=colors, explode=explode, autopct='%1.f%%', shadow=True)
title('RAM Memory Usage')
legend((patches[0], patches[2]), ('Used', 'Unused'), loc=(0,-.05))

savefig('utilization')
show()

4. Save the file as piechart.py in c:\temp
5. In Windows, go to Start -> All Programs -> Python 2.4 (Enthought Edition) -> IPython Shell
6. Type in "cd c:\temp"
7. Type "run piechart.py" and hit enter
Technorati tags: python, matplotlib, piechart

Larger python qt pyqt example

2006-12-18T12:04:00-08:00

This script is used to parse Windows Event Viewer logs. It uses dumpel.exe. It colors significant events and presents results in a QT GUI.

"""windows_audit.py
"""

__author__ = "So Feng"
__version__ = "$Revision: 1.0 $"
__date__ = "$Date: 2006/10/17 $"
__license__ = "Python"

import re
import datetime
from datetime import date
import time
import glob
import os
import sys
from Qt import *
import StringIO

# constants
NCHARS_TO_WRAP = 30
EARLY_HOUR = 6
LATE_HOUR = 20
outfile = "audit_" + str(datetime.date.today()) + ".txt"
red_eventids = ["560","565","592","678"] orange_eventids = ["539","629","644","531","544","545","675","676"]
yellow_eventids = ["529","530","532","533","534","535","536","537","681",
                   "576","608","609","610","611","624","625","626","627",
                   "628","630","631","632","633","634","635","636","637",
                   "638","639","640","641","642","643","645","646","647",
                   "648","649","650","651","652","653","654","655","656",
                   "657","658","659","660","661","662","663","664","665",
                   "666","667","668","669","670","672"]
green_eventids = ["528", "538", "540", "680"] other_eventids = ["612", "617", "618", "619"] failure_list = ('529', '530', '531', '532', '533', '534', '535', '536', '537', '539', '544', '545', '546', '547', '616', '675', '676', '677', '679', '681', ) success_list = ('512', '513', '514', '515', '516', '517', '518', '528', '538', '540', '541', '542', '543', '560', '561', '562', '563', '564', '565', '566', '576', '577', '578', '592', '593', '594', '595', '608', '609', '610', '611', '612', '613', '614', '615', '617', '618', '619', '620', '624', '625', '626', '627', '628', '630', '631', '632', '633', '634', '635', '636', '637', '638', '639', '640', '641', '642', '643', '644', '645', '646', '647', '648', '649', '650', '651', '652', '653', '654', '655', '656', '657', '658', '659', '660', '661', '662', '663', '664', '665', '666', '667', '668', '669', '670', '672', '673', '674', '678', '680', '682', '683', ) event_desc = {
'512': 'Windows NT is starting up. ',
'513': 'Windows NT is shutting down.\n             All logon sessions will be terminated by this shutdown. ',
'514': 'An authentication package has been loaded by the Local Security Authority.\n             This authentication package will be used to authenticate logon attempts. ',
'515': 'A trusted logon process has registered with the Local Security Authority.\n             This logon process will be trusted to submit logon requests. ',
'516': 'Internal resources allocated for the queuing of audit messages have been\n             exhausted, leading to the loss of some audits. ',
'517': 'The audit log was cleared ',
'518': 'An notification package has been loaded by the Security Account Manager.\n             This package will be notified of any account or password changes. ',
'528': 'Successful Logon: ',
'529': 'Logon Failure Unknown user name or bad password',
'530': 'Logon Failure Account logon time restriction violation',
'531': 'Logon Failure Account currently disabled',
'532': 'Logon Failure The specified user account has expired',
'533': 'Logon Failure User not allowed to logon at this computer',
'534': 'Logon Failure The user has not been granted the requested \n             logon type at this machine',
'535': 'Logon Failure The specified account\'s password has expired',
'536': 'Logon Failure The NetLogon component is not active',
'537': 'Logon Failure An unexpected error occurred during logon',
'538': 'User Logoff ',

'539': 'Logon Failure Account locked out',
'540': 'Successful Network Logon ',
'541': 'IKE security association established. ',
'542': 'IKE security association ended. ',
'543': 'IKE security association ended. ',
'544': 'IKE security association establishment failed because peer could not\n             authenticate. The certificate trust could not be established. ',
'545': 'IKE peer authentication failed. ',
'546': 'IKE security association establishment failed because peer\n             sent invalid proposal. ',
'547': 'IKE security association negotiation failed. ',
'560': 'Object Open ',
'561': 'Handle Allocated ',
'562': 'Handle Closed ',
'563': 'Object Open for Delete ',
'564': 'Object Deleted ',
'565': 'Object Open ',
'566': 'Object Operation ',
'576': 'Special privileges assigned to new logon: ',
'577': 'Privileged Service Called ',
'578': 'Privileged object operation ',
'592': 'A new process has been created ',
'593': 'A process has exited ',
'594': 'A handle to an object has been duplicated ',
'595': 'Indirect access to an object has been obtained ',
'608': 'User Right Assigned ',
'609': 'User Right Removed ',
'610': 'New Trusted Domain ',
'611': 'Removing Trusted Domain ',
'612': 'Audit Policy Change ',
'613': 'IPSec policy agent started ',
'614': 'IPSec policy agent disabled ',
'615': 'IPSEC PolicyAgent Service:  %1 ',
'616': 'IPSec policy agent encountered a potentially serious failure. ',
'617': 'Kerberos Policy Changed ',
'618': 'Encrypted Data Recovery Policy Changed ',
'619': 'Quality of Service Policy Changed ',
'620': 'Trusted Domain Information Modified: ',
'624': 'User Account Created ',
'625': 'User Account Type Change ',
'626': 'User Account Enabled ',
'627': 'Change Password Attempt ',
'628': 'User Account password set ',
'630': 'User Account Deleted: ',
'631': 'Security Enabled Global Group Created ',
'632': 'Security Enabled Global Group Member Added ',
'633': 'Security Enabled Global Group Member Removed ',
'634': 'Security Enabled Global Group Deleted ',
'635': 'Security Enabled Local Group Created ',
'636': 'Security Enabled Local Group Member Added ',
'637': 'Security Enabled Local Group Member Removed ',
'638': 'Security Enabled Local Group Deleted ',
'639': 'Security Enabled Local Group Changed ',
'640': 'General Account Database Change ',
'641': 'Security Enabled Global Group Changed ',
'642': 'User Account Changed ',
'643': 'Domain Policy Changed: %1 modified ',
'644': 'User Account Locked Out ',
'645': 'Computer Account Created ',
'646': 'Computer Account Changed ',
'647': 'Computer Account Deleted ',
'648': 'Security Disabled Local Group Created ',
'649': 'Security Disabled Local Group Changed ',
'650': 'Security Disabled Local Group Member Added ',
'651': 'Security Disabled Local Group Member Removed ',
'652': 'Security Disabled Local Group Deleted ',
'653': 'Security Disabled Global Group Created ',
'654': 'Security Disabled Global Group Changed ',
'655': 'Security Disabled Global Group Member Added ',
'656': 'Security Disabled Global Group Member Removed ',
'657': 'Security Disabled Global Group Deleted ',
'658': 'Security Enabled Universal Group Created ',
'659': 'Security Enabled Universal Group Changed ',
'660': 'Security Enabled Universal Group Member Added ',
'661': 'Security Enabled Universal Group Member Removed ',
'662': 'Security Enabled Universal Group Deleted ',
'663': 'Security Disabled Universal Group Created ',
'664': 'Security Disabled Universal Group Changed ',
'665': 'Security Disabled Universal Group Member Added ',
'666': 'Security Disabled Universal Group Member Removed ',
'667': 'Security Disabled Universal Group Deleted ',
'668': 'Group Type Changed ',
'669': 'Add SID History ',
'670': 'Add SID History ',
'672': 'Authentication Ticket Granted ',
'673': 'Service Ticket Granted ',
'674': 'Ticket Granted Renewed ',
'675': 'Pre-authentication failed ',
'676': 'Authentication Ticket Request Failed ',
'677': 'Authentication Ticket Request Failed ',
'678': 'Account Mapped for Logon by: %1 ',
'679': 'The name: %2 could not be mapped for logon by: %1 ',
'680': 'Account Used for Logon by: %1 ',
'681': 'The logon to account: %2 by: %1 from workstation: %3 failed. The error code was: %4 ',
'682': 'Session reconnected to winstation: ',
'683': 'Session disconnected from winstation: ',
              }
#sys.stderr = StringIO.StringIO()
#sys.stdout = StringIO.StringIO()
################################################################            
################################################################
def main():
    app = QApplication(sys.argv)
    app.setQuitOnLastWindowClosed(True)
    form = StartForm()
    form.show()
    report = ReportWindow()
    app.connect(form, SIGNAL("okClicked"),
                report.create)
    sys.exit(app.exec_())
    
################################################################            
################################################################
class StartForm(QWidget):
    def __init__(self, *args):
        QWidget.__init__(self, *args)
        
#        # directory tree
#        label_file = QLabel()
#        label_file.setText("Select file:")
#        dirmodel = QDirModel()
#        treeview = QTreeView(self)
#        treeview.setModel(dirmodel)
#        treeview.setRootIndex(dirmodel.index(QDir.currentPath()))
#        treeview.hideColumn(1)
#        treeview.hideColumn(2)
#        treeview.hideColumn(3)
#        treeview.header().hide()
        
        # date box
        self.label_date = QLabel()
        self.label_date.setText("Set date of last audit:")
        default = datetime.date.today() - datetime.timedelta(10)
        self.datebox = QDateEdit(QDate(default.year, default.month, default.day))
        
        # buttons
        spacer = QSpacerItem(20,40,QSizePolicy.Minimum,QSizePolicy.Expanding)
        self.button_ok = QPushButton()
        self.button_ok.setText("OK")
        self.button_ok.setDefault(True)
        button_cancel = QPushButton()
        button_cancel.setText("Cancel")
    
        # layout
#        layout_left = QVBoxLayout()
#        layout_left.addWidget(label_file)
#        layout_left.addWidget(treeview)
        layout_right = QVBoxLayout(self)
        layout_right.addWidget(self.label_date)
        layout_right.addWidget(self.datebox)
        layout_right.addItem(spacer)
        layout_right.addWidget(self.button_ok)
        layout_right.addWidget(button_cancel)
#        layout = QHBoxLayout(self)
#        layout.addLayout(layout_left)
#        layout.addLayout(layout_right)
        
        # connections
        self.connect(button_cancel, SIGNAL("clicked()"),
                    self.close)
        self.connect(self.button_ok, SIGNAL("clicked()"),
                    self.ok_clicked)
        
    def ok_clicked(self):
        self.label_date.setText("Getting eventlog data...")
        year = self.datebox.date().year()
        month = self.datebox.date().month()
        day = self.datebox.date().day()
        delta = datetime.date.today() - datetime.date(int(year),int(month),int(day))
        self.emit(SIGNAL("okClicked"), delta.days)
        self.close()
        
################################################################            
################################################################
class ReportWindow(QMainWindow):
    def __init__(self, *args):
        QMainWindow.__init__(self, *args)
#        self.cb = QCheckBox()
        self.table = MyTable()
        self.textbrowser = QTextBrowser()
        self.textbrowser.setFontFamily("Courier")
        self.textbrowser.setFontPointSize(10)
        splitter = QSplitter(Qt.Vertical, self)
#        splitter = QSplitter(Qt.Vertical)
        splitter.addWidget(self.table)
        splitter.addWidget(self.textbrowser)
#        vlayout = QVBoxLayout(self)
#        vlayout.addWidget(self.cb)
#        vlayout.addWidget(splitter)        
        self.setGeometry(100,100,750,550)
        self.setCentralWidget(splitter)
        
    def create(self, date):
        self.table.parse_event_log(date) 
        self.table.setdata()

        fh = open(outfile, "r")
        text = fh.read()
        self.textbrowser.setPlainText(text)
        
        self.show()

################################################################
################################################################
class MyTable(QTableWidget):
    def __init__(self, *args):
        QTableWidget.__init__(self, *args)
        self.setSortingEnabled(True)
        self.setSelectionMode(self.ContiguousSelection)
        #self.setFixedWidth(750)
        self.setGeometry(0,0,700,400)
        self.data = []
    
    def setdata(self):
        if len(self.data) == 0:
            self.setRowCount(1)
            self.setColumnCount(1)
            newitem = QTableWidgetItem("No data for this date range.")
            self.setItem(0, 0, newitem)
            self.resizeColumnsToContents()
        else:
            self.nrows = len(self.data)
            self.ncols = len(self.data[0])
            self.setRowCount(self.nrows)
            self.setColumnCount(self.ncols)
            self.setmydata_list()
            self.resizeColumnsToContents()
            self.setGridStyle(Qt.DotLine)
            self.setShowGrid(False)
            self.setColumnWidth(3, 250)
            #self.resizeRowsToContents()
            #self.setHorizontalHeaderLabels(['date','time','?','?','event','source','user','?','computer','data'])
            self.setHorizontalHeaderLabels(['Date','Time','Type','Event','User','Computer','Data'])

    def setmydata_list(self):    
        n = 0
        date_prev = ''
        for row in self.data:
            # event processing
            event = row[4]
            try:
                row[4] = event + ": " + event_desc[event]
            except:
                pass
            color = QColor("white")
            if event in red_eventids:
                color = QColor("red")
            elif event in orange_eventids:
                color = QColor("orange")
            elif event in yellow_eventids:
                color = QColor("yellow")
            elif event in green_eventids:
                #color = QColor("green")
                pass
            elif event in other_eventids:
                color = QColor("blue")
            
            # success or failure processing
            if row[2] == "8":
                row[2] = "Success"
                icon = QIcon("success.png")
            elif row[2] == "16":
                row[2] = "Failure"
                icon = QIcon("failure.png")
            else:
                row[2] = "Unknown"
                icon = QIcon("unknown.png")
                
            # date processing
            row[0] = format_datetime(row[0], row[1])
            (date, thetime) = re.split(r"\n", row[0])
            newday = False
            if date != date_prev:
                newday = True
            date_prev = date
            thetime = re.split(r":", thetime)
            hour = int(thetime[0])
            if hour <= EARLY_HOUR or hour >= LATE_HOUR:
                colordate = QColor(0,0,102)
            else:
                colordate = QColor("white")

            # user
            (dom, user) = re.split(r"\\", row[6])
            row[6] = user
            if user in ("SYSTEM", "NETWORK SERVICE", "LOCAL SERVICE", "ANONYMOUS LOGON"):
                textcolor = QColor("gray")
                font = QFont("Arial", 8)
            else:
                textcolor = QColor("black")
                font = QFont("Arial", 8)
                font.setBold(True)

            # insert line
            if newday:
                m = 0
                for j in range(len(row)-3):
                    text = ""
                    if j == 0:
                        text = date
                    newitem = QTableWidgetItem(text)
                    newitem.setBackgroundColor(QColor("black"))
                    self.setItem(n, m, newitem)
                    m += 1
                self.setRowHeight(n, 15)
                n += 1

            #for item in row:
            m = 0
            for j in range(len(row)):
                # skip these columns
                if j in (3,5,7):
                    continue
                
                item = row[j]
                if j == 1:
                    color2 = colordate
                else:
                    color2 = color

                newitem = QTableWidgetItem(item)
                newitem.setBackgroundColor(color2)
                newitem.setTextColor(textcolor)
                newitem.setFont(font)
                newitem.setTextAlignment(Qt.AlignTop)
                newitem.setToolTip(item)
                if j == 2:
                    newitem.setIcon(icon)
                self.setItem(n, m, newitem)
                m += 1
            self.setRowHeight(n, 16)
            n += 1

    def parse_event_log(self, ndays):
        """Parses event log file.
        Returns none.
        """
        # run dumpel.exe
        os.system("dumpel -f dumpel_results.txt -l security -d %s" % ndays)
        fin = open("dumpel_results.txt")
        lines = fin.readlines()
    
        # open output files
        #outfile = "audit_" + str(datetime.date.today()) + ".txt"
        fout=open(outfile,'w')
    
        # initialization
        fout.write("LOGON/LOGOFFS:\n")
        fout.write("--------------\n")
        output = "%-13s %3s %11s %12s %12s\n" % ('user','day','date','logon','logoff')
        fout.write(output)
        event578_count = 0
        user_logged_in = 0
        red_events = []
        orange_events = []
        yellow_events = []
        
        # loop on each line in the file
        for line in lines:
            items = re.split('\t', line)
            self.data.append(items)
            date = items[0]
            time = items[1]
            event = items[4]
            (dom, user) = re.split(r"\\", items[6])
            day = get_weekday(date)
            if event in red_eventids:
                red_events.append(line)
            elif event in orange_eventids:
                orange_events.append(line)
            elif event in yellow_eventids:
                yellow_events.append(line)
            elif user != "SYSTEM":
                #if (not user_logged_in) and (event == "528"):
                if event == "528":
                    if user_logged_in:
                        output = "%-13s %3s %11s %12s %12s\n" % (user,day,logon_date,logon_time,'no logout')
                        fout.write(output)
                    logon_time = time
                    logon_user = user
                    logon_date = date
                    logout_backup = ""
                    user_logged_in = 1
                        
                elif user_logged_in and event == "578":
                    if user == logon_user:
                        if event578_count == 0:
                            event578_count = 1
                        elif event578_count == 1:
                            event578_count = 0
                            user_logged_in = 0
                            output = "%-13s %3s %11s %12s %12s\n" % (user,day,logon_date,logon_time,time)
                            fout.write(output)
    
        # print red events
        fout.write("\nRED EVENTS:\n")
        fout.write("-----------\n")
        if len(red_events) == 0:
            fout.write("None.\n")
        else:
            for event in red_events:
                fout.write(event)
    
        # print orange events
        fout.write("\nORANGE EVENTS:\n")
        fout.write("--------------\n")
        if len(orange_events) == 0:
            fout.write("None.\n")
        else:
            for event in orange_events:
                fout.write(event)
    
        # print yellow events
        fout.write("\nYELLOW EVENTS:\n")
        fout.write("--------------\n")
        if len(yellow_events) == 0:
            fout.write("None.\n")
        else:
            for event in yellow_events:
                fout.write(event)
    
        # close files
        fin.close()
        fout.close()
    
        # print message
        print "Parse sucessful.\n"
        
        return

################################################################
################################################################
def get_weekday(date):
    (month,day,year) = re.split('/', date)
    weekday = datetime.date(int(year),int(month),int(day)).weekday()
    day_names = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']

    return day_names[weekday]

################################################################
################################################################
def format_datetime(date, thetime):
    (month,day,year) = re.split('/', date)
    dateobj = datetime.date(int(year),int(month),int(day))
    newdate = "%04d/%02d/%02d" % (dateobj.year, dateobj.month, dateobj.day)
    
    timeobj = time.strptime(thetime, "%I:%M:%S %p")
    newtime = time.strftime("%H:%M:%S", timeobj)
    
    return newdate + "\n" + newtime

################################################################
################################################################
def get_dateobject(date):
    (month,day,year) = re.split('/', date)
    return datetime.date(int(year),int(month),int(day))
  
################################################################
################################################################
if __name__ == "__main__":
    main()

QTableWidget Example using Python 2.4, QT 4.1.4, and PyQt

2006-10-25T16:24:00-07:00

QTableWidget Example using Python 2.4, QT 4.1.4, and PyQt

import sys
from Qt import *

lista = ['aa', 'ab', 'ac']
listb = ['ba', 'bb', 'bc']
listc = ['ca', 'cb', 'cc']
mystruct = {'A':lista, 'B':listb, 'C':listc}

class MyTable(QTableWidget):
    def __init__(self, thestruct, *args):
        QTableWidget.__init__(self, *args)
        self.data = thestruct
        self.setmydata()
        
    def setmydata(self):
        n = 0
        for key in self.data:
            m = 0
            for item in self.data[key]:
                newitem = QTableWidgetItem(item)
                self.setItem(m, n, newitem)
                m += 1
            n += 1

def main(args):
    app = QApplication(args)
    table = MyTable(mystruct, 5, 3)
    table.show()
    sys.exit(app.exec_())
    
if __name__=="__main__":
    main(sys.argv)

SaltyCrane: python

Notes on Fabric 2 and Python 3

Set up SSH config and SSH agent

Create a project, create a virtualenv, and install fabric2

Create a fabfile.py script

Run the fabric script

See also / References

Creating a GraphQL API with Python, Graphene, and Postgres

Contents

Install Python 3.7.2

Create project directory and virtualenv

Install Django and create a Django project

Run Postgres in Docker

Create a database

Configure Django to use Postgres

Install and configure Graphene

Create a new Django app and add a model

GraphQL all the things

Try it using the GraphiQL explorer

References / See also

Switching to OS X and front end development

Logging the actual host (instead of localhost) with the Python elasticsearch client

Calling JavaScript from Python to de-CloudFlare scraped content

An example using Python's groupby and defaultdict to do the same task

A bank style session timeout example using jQuery, Bootstrap, and Flask

Your session is about to expire

The old "%" string formatting and the new string .format() method handle unicode differently

See also

Python logging filters do not propagate like handlers and levels do

Subdomain-based configuration for a Flask local development server

create_app

SubdomainDispatcher

rundevserver

Usage

Github code

How to add a margin around markers in the Google Static Maps API using Python

Mock notes

Mocking different return values for multiple calls

Do you have a lot of short, single-use, private functions in your Python code?

How to expose a Flask local development server to the public using SSH remote port forwarding

Using RemoteForward in your ~/.ssh/config

References

See also

When is the try-finally block used in Python?

Using Python's gzip and StringIO to compress data in memory

Try 1: seek from the end fails

Try 2: data is not compressed

Try 5: file much too small

Try 6: unexpected end of file

Try 7: got it working

Try 8: (not really) final version

Try 9: didn't need to do that (final version)

References

How to use pip with crate.io

See also

How to run a Django local development server on a remote machine and access it in your browser on your local machine using SSH port forwarding

Using LocalForward in your ~/.ssh/config

Reference

Testing HTTPS w/ Flask's development server using stunnel on Ubuntu

See also

python enum types

Python data object motivated by a desire for a mutable namedtuple with default values

Example POSTing binary data using pycurl

How to prevent nose (unittest) from using the docstring when verbosity >= 2

Test coverage with nose and coverage.py

Install nose and coverage.py

Run it

Display the HTML report

Get branch coverage

Pycon 2012 talks that I saw that I enjoyed

Favorite talk of the conference

Other great talks (in chronological order)

Setting the Expires header for S3 media using Python and boto

Install boto

Script

References

How to get the filename and it's parent directory in Python

Creating a histogram plot with python

A unique Python redis-based queue with delay

Install

`Unrecognized server version info` warning

Django trace tool, `django-trace`