SaltyCrane Blog — Notes on JavaScript and web development

Using a Python timeout decorator for uploading to S3

At work we are uploading many images to S3 using Python's boto library. However we are experiencing a RequestTimeTooSkewed error once every 100 uploads on average. We googled, but did not find a solution. Our system time was in sync and our file sizes were small (~50KB).

Since we couldn't find the root cause, we added a watchdog timer as a bandaid solution. We already use a retry decorator to retry uploads to S3 when we get a 500 Internal Server Error response. To this we added a timeout decorator which cancels the S3 upload if it takes more than a couple minutes. With this decorator, we don't have to wait the full 15 minutes before S3 returns the 403 Forbidden (RequestTimeTooSkewed error) response.

I found the timeout decorator at Activestate's Python recipes. It makes use of Python's signal library. Below is an example of how it's used.

import signal

class TimeoutError(Exception):
    def __init__(self, value = "Timed Out"):
        self.value = value
    def __str__(self):
        return repr(self.value)

def timeout(seconds_before_timeout):
    def decorate(f):
        def handler(signum, frame):
            raise TimeoutError()
        def new_f(*args, **kwargs):
            old = signal.signal(signal.SIGALRM, handler)
            signal.alarm(seconds_before_timeout)
            try:
                result = f(*args, **kwargs)
            finally:
                # reinstall the old signal handler
                signal.signal(signal.SIGALRM, old)
                # cancel the alarm
                # this line should be inside the "finally" block (per Sam Kortchmar)
                signal.alarm(0)
            return result
        new_f.func_name = f.func_name
        return new_f
    return decorate

Try it out:

import time

@timeout(5)
def mytest():
    print "Start"
    for i in range(1,10):
        time.sleep(1)
        print "%d seconds have passed" % i

if __name__ == '__main__':
    mytest()

Results:

Start
1 seconds have passed
2 seconds have passed
3 seconds have passed
4 seconds have passed
Traceback (most recent call last):
  File "timeout_ex.py", line 47, in <module>
    function_times_out()
  File "timeout_ex.py", line 17, in new_f
    result = f(*args, **kwargs)
  File "timeout_ex.py", line 42, in function_times_out
    time.sleep(1)
  File "timeout_ex.py", line 12, in handler
    raise TimeoutError()
__main__.TimeoutError: 'Timed Out'

Bug found by Sam Kortchmar (added 2018-08-18)

The code on the Activestate recipe has signal.alarm(0) outside of the finally block, but Sam Kortchmar reported to me that it needs to be inside the finally block so that the alarm will be cancelled even if there is an exception in the user's function that is handled by the user. With signal.alarm(0) outside of the finally block, the alarm still fires in that case.

Here is the test case sent by Sam:

import unittest2
import time

class TestTimeout(unittest2.TestCase):
    def test_watchdog_doesnt_kill_interpreter(self):
        """If this test executes at all, it's working!
        otherwise, the whole testing section will be killed
        and print out "Alarm clock"
        """
        @timeout(1)
        def my_func():
            raise Exception

        try:
            my_func()
        except Exception:
            pass
        time.sleep(1.2)
        assert True

The RequestTimeTooSkewed error

S3ResponseError: 403 Forbidden
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>RequestTimeTooSkewed</Code><Message>The difference between the request time and the current time is too large.</Message><MaxAllowedSkewMilliseconds>900000</MaxAllowedSkewMilliseconds><RequestId>7DDDS67HF8E37</RequestId><HostId>LKE893JFGDLASKJR9BJ-A9NASFPNAOPWEJORG-98DFGJA498JVJ-A04320JF0293JLKE</HostId><RequestTime>Tue, 27 Apr 2010 22:20:58 GMT</RequestTime><ServerTime>2010-04-27T22:55:24Z</ServerTime></Error>

See also

Comments


#1 Edwrrd M. Goldberg commented on :

The problem is that the internal "hardware" clock is off the mark. You need to NTPD on that server and re-sync the clock. It is important to skip the use the local hardware clock and adjust time ONLY to the NTP time standard.

This issue comes up most of the time on Windows EC2 Servers. But I have seen CentOS servers that have this issue also.

NTPD or xntpd is the cure. Just add this service to the server with yum and set up the external resources.

Edward M. Goldberg http://myCloudWaatcher.com/ e.m.g.


#3 Eliot commented on :

Edwrrd: We did apt-get install ntp on our Ubuntu Hardy server and still got the RequestTimeTooSkewed errors.

Alexander: I don't know much about thread safety so I'm not sure.


#4 Kevin commented on :

Hey Eliot, I know this is an old thread, but did you ever figure out the root cause of your _RequestTimeTooSkewed_ errors? I think I'm running into the same issue -- NTPD is installed and the time on the server is correct, but requests will randomly hang for 15 minutes before returning a 403 error. Subsequent requests work immediately.

I'll probably end up implementing a similar fix, but I'd like to fix the source of the problem if I can...


#5 Eliot commented on :

Hi Kevin, I don't think I ever did find the root cause. If I did, I don't remember now. Sorry.


#6 Mansour commented on :

I have sort of run into related issue with boto and Google Storage and I seem to be able to use socket.setdefaulttimeout() function:

import socket
socket.setdefaulttimeout(30)

It sets the timeout on reads/writes to newly created sockets. Currently the only advantage of using this function over the alarm signal approach that I can think of is the ability to use it in a multithreaded environment (@Alexander signals and threads don't match well).

Mansour