SaltyCrane Blog — Notes on JavaScript and web development

Notes on using EC2 command line tools

Create AWS accounts
Create a X.509 Certificate

Note: A X.509 Certificate is one type of Access Identifier. Access Identifiers are used to "identify yourself as the sender of a request to an AWS web service". There are two types of access identifiers: AWS Access Key Identifiers and X.509 Certificates. AWS Access Key Identifiers are supported by all Amazon Web Services and X.509 Certificates are supported only by Amazon's EC2 and SQS services (see here for the chart). However, for some reason, the popular Java command line tools for EC2 only support X.509 Certificates (and not AWS Access Key Identifiers).

  • From Your Account page, select Access Identifiers.
  • In the "X.509 Certificate" section, click "Create New".
  • Download both the "Private Key" file and the "X.509 Certificate" file to the directory, ~/.ec2. (The private key file will be named something like pk-XXXXXXXXXXXXXXXXXXXXXX.pem and the X.509 Certificate file will be named something like cert-XXXXXXXXXXXXXXXXXXXXXX.pem.)
Install Java

The command line tools require Java version 5 or later. Only the JRE is required.

  • $ sudo apt-get install sun-java6-jre
Download Java Command-line Tools
Define environment variables
  • Add the following lines to your ~/.bashrc (or wherever you set your environment variables).
    export EC2_HOME=$HOME/lib/ec2-api-tools-1.3-24159
    export JAVA_HOME=/usr
    export EC2_PRIVATE_KEY=$HOME/.ec2/pk-XXXXXXXXXXXXXXXXXXXX.pem
    export EC2_CERT=$HOME/.ec2/cert-XXXXXXXXXXXXXXXXXXXX.pem
    export PATH=$PATH:$EC2_HOME/bin
    
  • Source your .bashrc or whichever file you used
    $ source ~/.bashrc
Test the command-line tools
  • Run the ec2-describe-images command to verify everything is working. It should list all the Ubuntu 8.xx images from Alestic.
    $ ec2-describe-images -a | grep alestic/ubuntu-8
    Results:
    IMAGE   ami-3a7c9953    alestic/ubuntu-8.04-hardy-base-20080419.manifest.xml    063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-75789d1c    alestic/ubuntu-8.04-hardy-base-20080424.manifest.xml    063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-ce44a1a7    alestic/ubuntu-8.04-hardy-base-20080430.manifest.xml    063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-2048ad49    alestic/ubuntu-8.04-hardy-base-20080514.manifest.xml    063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-6a57b203    alestic/ubuntu-8.04-hardy-base-20080517.manifest.xml    063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-26bc584f    alestic/ubuntu-8.04-hardy-base-20080628.manifest.xml    063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-179e7a7e    alestic/ubuntu-8.04-hardy-base-20080803.manifest.xml    063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-c0fa1ea9    alestic/ubuntu-8.04-hardy-base-20080905.manifest.xml    063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-38d43051    alestic/ubuntu-8.04-hardy-base-20080922.manifest.xml    063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-1cd73375    alestic/ubuntu-8.04-hardy-base-20080924.manifest.xml    063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-337c995a    alestic/ubuntu-8.04-hardy-desktop-20080419.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-4f789d26    alestic/ubuntu-8.04-hardy-desktop-20080424.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-f744a19e    alestic/ubuntu-8.04-hardy-desktop-20080430.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-1f4bae76    alestic/ubuntu-8.04-hardy-desktop-20080514.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-0e57b267    alestic/ubuntu-8.04-hardy-desktop-20080517.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-b5bc58dc    alestic/ubuntu-8.04-hardy-desktop-20080628.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-f39e7a9a    alestic/ubuntu-8.04-hardy-desktop-20080803.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-44c4202d    alestic/ubuntu-8.04-hardy-desktop-20080905.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-f7d4309e    alestic/ubuntu-8.04-hardy-desktop-20080922.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-88d733e1    alestic/ubuntu-8.04-hardy-desktop-20080924.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-bcbe5ad5    alestic/ubuntu-8.04-hardy-rightscale-20080701.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-27b95d4e    alestic/ubuntu-8.04-hardy-rightscale-20080703.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-b1ea0ed8    alestic/ubuntu-8.04-hardy-rightscale-20080824.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-47c4202e    alestic/ubuntu-8.04-hardy-rightscale-20080905.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-f4d4309d    alestic/ubuntu-8.04-hardy-rightscale-20080922.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-89d733e0    alestic/ubuntu-8.04-hardy-rightscale-20080924.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-dcbc58b5    alestic/ubuntu-8.10-intrepid-base-20080628.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-db9e7ab2    alestic/ubuntu-8.10-intrepid-base-20080804.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-9de105f4    alestic/ubuntu-8.10-intrepid-base-20080814.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-c3fa1eaa    alestic/ubuntu-8.10-intrepid-base-20080905.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-3bd43052    alestic/ubuntu-8.10-intrepid-base-20080922.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-1ad73373    alestic/ubuntu-8.10-intrepid-base-20080924.manifest.xml 063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-b6bc58df    alestic/ubuntu-8.10-intrepid-desktop-20080628.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-d69e7abf    alestic/ubuntu-8.10-intrepid-desktop-20080804.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-d4e206bd    alestic/ubuntu-8.10-intrepid-desktop-20080815.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-7dc22614    alestic/ubuntu-8.10-intrepid-desktop-20080908.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-f5d4309c    alestic/ubuntu-8.10-intrepid-desktop-20080922.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
    IMAGE   ami-b6d733df    alestic/ubuntu-8.10-intrepid-desktop-20080924.manifest.xml      063491364108    available       public          i386    machine aki-a71cf9ce    ari-a51cf9cc
Generate a keypair

In the second step, I generated a keypair as my X.509 Certificate. That was used to identifiy myself to Amazon Web Services. Now I need to create another keypair which is used to log into a running EC2 instance. (Note, there is exactly one X.509 Certificate per user (i.e. AWS account), but a user can have many keypairs used for logging into various EC2 instances.) See also the Generating a keypair section in the Getting Started Guide.

  • Generate the keypair. I named the keypair, disco-keypair because I will use this keypair with EC2 instances used to try out Disco.
    $ ec2-add-keypair disco-keypair > ~/.ec2/id_rsa-disco-keypair
        
  • Set the permissions on the private key
    chmod 600 ~/.ec2/id_rsa-disco-keypair
Run an EC2 instance
  • Select an image to run. I used the alestic/ubuntu-8.04-hardy-base-20080924 image with image ID ami-1cd73375.
  • Run the instance
    $ ec2-run-instances -k disco-keypair ami-1cd73375
    It should return something like:
    RESERVATION     r-568f5d3f      719606167433    default
    INSTANCE        i-339f3c5a      ami-1cd73375                    pending disco-keypair       0               m1.small        2008-09-28T00:50:35+0000        us-east-1c aki-a71cf9ce     ari-a51cf9cc
  • Check the status of the running instance:
    $ ec2-describe-instances
    After a short period of time, it should return something like:
    RESERVATION     r-568f5d3f      719606167433    default
    INSTANCE        i-339f3c5a      ami-1cd73375    ec2-75-101-200-13.compute-1.amazonaws.com       ip-10-251-30-10.ec2.internal     running disco-keypair   0               m1.small        2008-09-28T00:50:35+0000us-east-1c       aki-a71cf9ce    ari-a51cf9cc
    Note the address ec2-75-101-200-13.compute-1.amazonaws.com. This is the external address used to connect to the instance. Also note the instance ID i-339f3c5a. This is needed to terminate the instance.
  • Authorize access to the instance through ports 22 (ssh) and 80 (http)
    $ ec2-authorize default -p 22
          GROUP           default
    PERMISSION              default ALLOWS  tcp     22      22      FROM    CIDR    0.0.0.0/0
    $ ec2-authorize default -p 80
    GROUP           default
    PERMISSION              default ALLOWS  tcp     80      80      FROM    CIDR    0.0.0.0/0
SSH into instance
  • Use the address from the previous step to SSH into your instance:
    $ ssh -i ~/.ec2/id_rsa-disco-keypair -l root ec2-75-101-200-13.compute-1.amazonaws.com
Terminate the instance
  • $ ec2-terminate-instance i-339f3c5a
    which returns:
    INSTANCE        i-339f3c5a      running shutting-down
  • Running ec2-describe-instances shows that the instance is terminated.
    $ ec2-describe-instances 
    RESERVATION     r-568f5d3f      719606167433    default
    INSTANCE        i-339f3c5a      ami-1cd73375                    terminated      disco-keypair       0               m1.small        2008-09-28T00:50:35+0000           aki-a71cf9ce     ari-a51cf9cc

Python urlparse example

Here is an example of how to parse a URL using Python's urlparse module. See the urlparse module documentation for more information.

from urlparse import urlparse

url = 'http://www.gurlge.com:80/path/file.html;params?a=1#fragment'
o = urlparse(url)
print o.scheme
print o.netloc
print o.hostname
print o.port
print o.path
print o.params
print o.query
print o.fragment
print o.username
print o.password

Results:

http
www.gurlge.com:80
www.gurlge.com
80
/path/file.html
params
a=1
fragment
None
None

How to get stdout and stderr using Python's subprocess module

I wrote previously about how to get stdout and stderr using os.popen4. However, per the Python documentation, using the subprocess module is preferred:

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. This module intends to replace several other, older modules and functions, such as:

os.system
os.spawn*
os.popen*
popen2.*
commands.*

See the subprocess module documentation for more information.

Here is how to get stdout and stderr from a program using the subprocess module:

from subprocess import Popen, PIPE, STDOUT

cmd = 'ls /etc/fstab /etc/non-existent-file'
p = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
output = p.stdout.read()
print output

Results:

ls: cannot access /etc/non-existent-file: No such file or directory
/etc/fstab

How to monitor an Apache web server using Monit

Monit is a tool that can monitor your Apache web server, MySQL database, or other daemon process. It can restart the service based on configurable conditions such as CPU usage, memory usage, number of children, etc. It can log status to a file, email status, and it has a web interface for monitoring or restarting the service. Here are the steps I took to install and configure the Monit tool on Ubuntu Hardy. It merely monitors the status of my Apache web server and restarts it if it stops. It also checks if the memory used by Apache is greater than 1 MB and logs it in /var/log/monit.log. For more configuration options, see the examples in the default /etc/monit/monitrc file or the configuration examples in the monit documentation. I also found Ubuntu Geek's guide to be very helpful.

  • Install monit
    $ sudo apt-get install monit
  • Edit the config file
    $ sudo nano /etc/monit/monitrc
    Insert the following:
    # check services every 2 minutes
    set daemon 120
    
    # logging 
    set logfile /var/log/monit.log
    
    # web interface
    set httpd port 2812 and
        use address localhost # only accept connection from localhost
        allow localhost       # allow localhost to connect to the server    
        allow admin:monit     # require user ‘admin’ with password ‘monit’
    
    # monitor apache
    check process apache2 with pidfile /var/run/apache2.pid
        start program = "/etc/init.d/apache2 start"
        if totalmem > 1.0 MB for 2 cycles then alert
  • Check the file syntax
    $ sudo monit -t
  • Enable the service
    $ sudo nano /etc/default/monit
    Change the following line:
    startup=1
  • Start monit
    $ sudo /etc/init.d/monit start
  • Point your browser at http://localhost:2812 and log in using the user "admin" and the password "monit".
  • Click on "apache2" and you can see information about the Apache process.

Simple cron example

Simple cron example (tested on Ubuntu):

  • Edit your (user) crontab file
    $ crontab -e
    This will bring up your editor (nano by default in Ubuntu)

  • Enter the following inside. This will append the current date to a log file every minute. The 6 fields of the crontab file are: minute, hour, day of month, month, day of week, command.
    * * * * * /bin/date >> /tmp/cron_output
    
    Be sure to put a blank line at the end of the file.
    (NOTE 1: >> only redirects STDOUT to a file. To redirect both STDOUT and STDERR, use something like /bin/date >> /tmp/cron_output 2>&1)
    (NOTE 2: If output is not redirected, cron will try to email the output to you. To do this, a mail transfer agent such as sendmail or postfix must be installed.)
    (NOTE 3 (added 2015-06-24): When I created my cron script in /etc/cron.d with Emacs using sudo::, cron didn't pick up my script. When I created it with nano, cron picked it up. It seems the cause is the permissions of the cron script. Emacs created the script with 664 permissions while nano created the script with 644 permissions. When I changed the permissions to 644, it started working. I am running Ubuntu 15.04. This Ask Ubuntu answer confirms that a 644 permission is problematic because it is considered insecure. See /var/log/syslog for cron messages. The Ask Ubuntu page has a lot of other good tips: Reasons why crontab does not work)

  • Exit the editor. It should output:
    crontab: installing new crontab
  • Check that it is working:
    tail -f /tmp/cron_output
    You should see the date updated every minute on the minute (or close to it):
    Tue Sep 16 23:58:01 PDT 2008
    Tue Sep 16 23:59:01 PDT 2008
    Wed Sep 17 00:00:01 PDT 2008
    Wed Sep 17 00:01:01 PDT 2008
    ...
    

See also my post: Postgres backup with cron

Django Blog Project #16: Adding URL redirects using the Blogger API

I wanted to insert URL redirects on my old Blogger posts pointing to my new blog articles. A comment on my Migrating Blogger Posts post suggested that I use the (Python) Blogger API. This was a great suggestion. The Blogger API was well documented and easy to use. Here is the script I used to insert the URL redirects on each of my old Blogger posts.

from gdata import service
import re
import gdata
import atom

NEW_HTML = """
<script language="javascript">
  setTimeout('location.href="%s"', 2000);
</script>
<br /><br />
<b>
  </b><p>This is my OLD blog. I've copied this post over to my NEW blog at:</p>
  <p><a href="%s">%s</a></p>
  <p>You should be redirected in 2 seconds.</p>

<br /><br />
"""

# authenticate
blogger_service = service.GDataService('[email protected]', 'mypassword')
blogger_service.service = 'blogger'
blogger_service.account_type = 'GOOGLE'
blogger_service.server = 'www.blogger.com'
blogger_service.ProgrammaticLogin()

# get list of blogs
query = service.Query()
query.feed = '/feeds/default/blogs'
feed = blogger_service.Get(query.ToUri())

# get blog id
blog_id = feed.entry[0].GetSelfLink().href.split("/")[-1]

# get all posts
query = service.Query()
query.feed = '/feeds/%s/posts/default' % blog_id
query.published_min = '2000-01-01'
query.published_max = '2009-01-01'
query.max_results = 1000
feed = blogger_service.Get(query.ToUri())
print feed.title.text

for entry in feed.entry:
    # create link to article on new blog
    new_link = re.sub(r'http://iwiwdsmi\.blogspot\.com/(.*)\.html',
                      r'http://www.saltycrane.com/blog/\1/',
                      entry.link[0].href)
    print new_link

    # update post
    to_add = NEW_HTML % (new_link, new_link, new_link)
    entry.content.text = to_add + entry.content.text
    blogger_service.Put(entry, entry.GetEditLink().href)

Notes on parallel processing with Python and Twisted

Twisted is a networking engine written in Python, that among many other things, can be used to do parallel processing. It is very big, though, so I had a hard time finding what I needed. I browsed through the Twisted Documentation and the Twisted O'Reilly book. There is also a Recipe in the Python Cookbook. However, I found Bruce Eckel's article, Concurrency with Python, Twisted, and Flex to be the most helpful. (See also Bruce Eckel's initial article on Twisted: Grokking Twisted)

Here are my notes on running Bruce Eckel's example. I removed the Flex part because I didn't need or know anything about it. This example runs a Controller which starts a number of separate parallel processes running Solvers (a.ka. workers). It also allows for communication between the Controller and Solvers. Though this example only runs on one machine, the article said extending this to multiple machines is not difficult. For a good explanation of how this works, please see the original article.

Here is solver.py which is copied from the original article. The actual "work" is done in the step method. I only added some debugging print statements for myself.

"""
solver.py
Original version by Bruce Eckel
Solves one portion of a problem, in a separate process on a separate CPU
"""
import sys, random, math
from twisted.spread import pb
from twisted.internet import reactor

class Solver(pb.Root):

    def __init__(self, id):
        print "solver.py %s: solver init" % id
        self.id = id

    def __str__(self): # String representation
        return "Solver %s" % self.id

    def remote_initialize(self, initArg):
        return "%s initialized" % self

    def step(self, arg):
        print "solver.py %s: solver step" % self.id
        "Simulate work and return result"
        result = 0
        for i in range(random.randint(1000000, 3000000)):
            angle = math.radians(random.randint(0, 45))
            result += math.tanh(angle)/math.cosh(angle)
        return "%s, %s, result: %.2f" % (self, str(arg), result)

    # Alias methods, for demonstration version:
    remote_step1 = step
    remote_step2 = step
    remote_step3 = step

    def remote_status(self):
        print "solver.py %s: remote_status" % self.id
        return "%s operational" % self

    def remote_terminate(self):
        print "solver.py %s: remote_terminate" % self.id
        reactor.callLater(0.5, reactor.stop)
        return "%s terminating..." % self

if __name__ == "__main__":
    port = int(sys.argv[1])
    reactor.listenTCP(port, pb.PBServerFactory(Solver(sys.argv[1])))
    reactor.run()

Here is controller.py. This is also copied from the original article but I removed the Flex interface and created calls to start and terminate in the Controller class. I'm not sure if this makes sense, but at least this allowed me to run the example. I also moved the terminate method from the FlexInterface to the Controller.

"""
Controller.py
Original version by Bruce Eckel
Starts and manages solvers in separate processes for parallel processing.
"""
import sys
from subprocess import Popen
from twisted.spread import pb
from twisted.internet import reactor, defer

START_PORT = 5566
MAX_PROCESSES = 2

class Controller(object):

    def broadcastCommand(self, remoteMethodName, arguments, nextStep, failureMessage):
        print "controller.py: broadcasting..."
        deferreds = [solver.callRemote(remoteMethodName, arguments) 
                     for solver in self.solvers.values()]
        print "controller.py: broadcasted"
        reactor.callLater(3, self.checkStatus)

        defer.DeferredList(deferreds, consumeErrors=True).addCallbacks(
            nextStep, self.failed, errbackArgs=(failureMessage))
    
    def checkStatus(self):
        print "controller.py: checkStatus"
        for solver in self.solvers.values():
            solver.callRemote("status").addCallbacks(
                lambda r: sys.stdout.write(r + "\n"), self.failed, 
                errbackArgs=("Status Check Failed"))
                                                     
    def failed(self, results, failureMessage="Call Failed"):
        print "controller.py: failed"
        for (success, returnValue), (address, port) in zip(results, self.solvers):
            if not success:
                raise Exception("address: %s port: %d %s" % (address, port, failureMessage))

    def __init__(self):
        print "controller.py: init"
        self.solvers = dict.fromkeys(
            [("localhost", i) for i in range(START_PORT, START_PORT+MAX_PROCESSES)])
        self.pids = [Popen(["python", "solver.py", str(port)]).pid
                     for ip, port in self.solvers]
        print "PIDS: ", self.pids
        self.connected = False
        reactor.callLater(1, self.connect)

    def connect(self):
        print "controller.py: connect"
        connections = []
        for address, port in self.solvers:
            factory = pb.PBClientFactory()
            reactor.connectTCP(address, port, factory)
            connections.append(factory.getRootObject())
        defer.DeferredList(connections, consumeErrors=True).addCallbacks(
            self.storeConnections, self.failed, errbackArgs=("Failed to Connect"))

        print "controller.py: starting parallel jobs"
        self.start()

    def storeConnections(self, results):
        print "controller.py: storeconnections"
        for (success, solver), (address, port) in zip(results, self.solvers):
            self.solvers[address, port] = solver
        print "controller.py: Connected; self.solvers:", self.solvers
        self.connected = True

    def start(self):
        "controller.py: Begin the solving process"
        if not self.connected:
            return reactor.callLater(0.5, self.start)
        self.broadcastCommand("step1", ("step 1"), self.step2, "Failed Step 1")

    def step2(self, results):
        print "controller.py: step 1 results:", results
        self.broadcastCommand("step2", ("step 2"), self.step3, "Failed Step 2")

    def step3(self, results):
        print "controller.py: step 2 results:", results
        self.broadcastCommand("step3", ("step 3"), self.collectResults, "Failed Step 3")

    def collectResults(self, results):
        print "controller.py: step 3 results:", results
        self.terminate()
        
    def terminate(self):
        print "controller.py: terminate"
        for solver in self.solvers.values():
            solver.callRemote("terminate").addErrback(self.failed, "Termination Failed")
        reactor.callLater(1, reactor.stop)
        return "Terminating remote solvers"

if __name__ == "__main__":
    controller = Controller()
    reactor.run()

To run it, put the two files in the same directory and run python controller.py. You should see 2 CPUs (if you have 2) go up to 100% usage. And here is the screen output:

controller.py: init
PIDS:  [12173, 12174]
solver.py 5567: solver init
solver.py 5566: solver init
controller.py: connect
controller.py: starting parallel jobs
controller.py: storeconnections
controller.py: Connected; self.solvers: {('localhost', 5567): , ('localhost', 5566): }
controller.py: broadcasting...
controller.py: broadcasted
solver.py 5566: solver step
solver.py 5567: solver step
controller.py: checkStatus
solver.py 5566: remote_status
Solver 5566 operational
solver.py 5567: remote_status
controller.py: step 1 results: [(True, 'Solver 5567, step 1, result: 683825.75'), (True, 'Solver 5566, step 1, result: 543177.17')]
controller.py: broadcasting...
controller.py: broadcasted
Solver 5567 operational
solver.py 5566: solver step
solver.py 5567: solver step
controller.py: checkStatus
solver.py 5566: remote_status
Solver 5566 operational
solver.py 5567: remote_status
controller.py: step 2 results: [(True, 'Solver 5567, step 2, result: 636793.90'), (True, 'Solver 5566, step 2, result: 335358.16')]
controller.py: broadcasting...
controller.py: broadcasted
Solver 5567 operational
solver.py 5566: solver step
solver.py 5567: solver step
controller.py: checkStatus
solver.py 5566: remote_status
Solver 5566 operational
solver.py 5567: remote_status
controller.py: step 3 results: [(True, 'Solver 5567, step 3, result: 847386.43'), (True, 'Solver 5566, step 3, result: 512120.15')]
controller.py: terminate
Solver 5567 operational
solver.py 5566: remote_terminate
solver.py 5567: remote_terminate

Notes on starting processes in Python

Using os.fork()

Here is an example using os.fork() to spawn 5 processes each running the python function, myfunc. Don't forget the os._exit() at the end. Per the docs, normally, sys.exit() is used, but os._exit() can be used in child processes after a fork. It does not call cleanup handlers, flush stdio buffers, etc.

import os
import time

def myfunc(i):
    print "sleeping 5 seconds from process %s" % i
    time.sleep(5)
    print "finished sleeping from process %s" % i

for i in range(5):
    pid = os.fork()
    if pid == 0:
        myfunc(i)
        os._exit(0)

Results:

sleeping 5 seconds from process 0
sleeping 5 seconds from process 1
sleeping 5 seconds from process 2
sleeping 5 seconds from process 3
sleeping 5 seconds from process 4

And 5 seconds later...

finished sleeping from process 0
finished sleeping from process 1
finished sleeping from process 2
finished sleeping from process 3
finished sleeping from process 4
Running an external script in subprocesses

Alternatively, if you want to run an external script in multiple processes, you can use the Popen class in the subprocess module. For example, to run the following script, called "myscript.py":

"myscript.py"
import sys
import time

def myfunc(i):
    print "sleeping 5 seconds from process %s" % i
    time.sleep(5)
    print "finished sleeping from process %s" % i

if __name__ == '__main__':
    myfunc(sys.argv[1])

use the following Python code stored in the same directory:

"popen_ex.py"
from subprocess import Popen

for i in range(5):
    Popen(['python', './myscript.py', str(i)])

The screen output is the same as the previous example. What's the differnce? fork() copies the process memory space including open file descriptors to the child process. In the second example, since I am executing a new Python interpreter from scratch, I get a "cleaner" start but probably more overhead as well.

Django Blog Project #15: New site logo

I now have a new site logo design drawn by my wife, Angela! Doesn't it look great? My previous logo was a crane picture I had just pulled from the web somewhere. So it is nice to have a custom logo done for me. Luckily my wife is artistic and didn't mind drawing it for me. I also made some minor changes to the title block to make things look a little better up there. Now to figure out how to style the rest of the page.

I also got a "memory over limit" warning from Webfaction this week. Over the weekend, I had redirected all my old Blogger posts to this blog, so apparently the small increase in traffic brought to light some of my inefficient code. To help solve the problem, I switched over to django-tagging. This eliminated a bunch of my inefficient code and I appear to be within the memory limits now. There is still another section of code I need to rework, but this solves the problem for now. Django-tagging is pretty cool-- I haven't quite got everything working correctly, but I will be sure to write some notes on it when I get the time.

Simplistic Python Thread example

Here is a simple Python example using the Thread object in the threading module.

import time
from threading import Thread

def myfunc(i):
    print "sleeping 5 sec from thread %d" % i
    time.sleep(5)
    print "finished sleeping from thread %d" % i

for i in range(10):
    t = Thread(target=myfunc, args=(i,))
    t.start()

Results:

sleeping 5 sec from thread 0
sleeping 5 sec from thread 1
sleeping 5 sec from thread 2
sleeping 5 sec from thread 3
sleeping 5 sec from thread 4
sleeping 5 sec from thread 5
sleeping 5 sec from thread 6
sleeping 5 sec from thread 7
sleeping 5 sec from thread 8
sleeping 5 sec from thread 9

...and 5 seconds later:

finished sleeping from thread 0
finished sleeping from thread 1
finished sleeping from thread 2
finished sleeping from thread 3
finished sleeping from thread 4
finished sleeping from thread 5
finished sleeping from thread 6
finished sleeping from thread 7
finished sleeping from thread 8
finished sleeping from thread 9