At work we are uploading many images to S3 using Python's boto library. However we are experiencing a RequestTimeTooSkewed error once every 100 uploads on average. We googled, but did not find a solution. Our system time was in sync and our file sizes were small (~50KB).
Since we couldn't find the root cause, we added a watchdog timer as a bandaid solution. We already use a retry decorator to retry uploads to S3 when we get a 500 Internal Server Error response. To this we added a timeout decorator which cancels the S3 upload if it takes more than a couple minutes. With this decorator, we don't have to wait the full 15 minutes before S3 returns the 403 Forbidden (RequestTimeTooSkewed error) response.
import signal class TimeoutError(Exception): def __init__(self, value = "Timed Out"): self.value = value def __str__(self): return repr(self.value) def timeout(seconds_before_timeout): def decorate(f): def handler(signum, frame): raise TimeoutError() def new_f(*args, **kwargs): old = signal.signal(signal.SIGALRM, handler) signal.alarm(seconds_before_timeout) try: result = f(*args, **kwargs) finally: # reinstall the old signal handler signal.signal(signal.SIGALRM, old) # cancel the alarm # this line should be inside the "finally" block (per Sam Kortchmar) signal.alarm(0) return result new_f.func_name = f.func_name return new_f return decorate
Try it out:
import time @timeout(5) def mytest(): print "Start" for i in range(1,10): time.sleep(1) print "%d seconds have passed" % i if __name__ == '__main__': mytest()
Start 1 seconds have passed 2 seconds have passed 3 seconds have passed 4 seconds have passed Traceback (most recent call last): File "timeout_ex.py", line 47, in <module> function_times_out() File "timeout_ex.py", line 17, in new_f result = f(*args, **kwargs) File "timeout_ex.py", line 42, in function_times_out time.sleep(1) File "timeout_ex.py", line 12, in handler raise TimeoutError() __main__.TimeoutError: 'Timed Out'
Bug found by Sam Kortchmar (added 2018-08-18)
The code on the Activestate recipe has
signal.alarm(0) outside of the
finally block, but Sam Kortchmar reported to me that it needs to be inside the
finally block so that the alarm will be cancelled even if there is an exception in the user's function that is handled by the user. With
signal.alarm(0) outside of the
finally block, the alarm still fires in that case.
Here is the test case sent by Sam:
import unittest2 import time class TestTimeout(unittest2.TestCase): def test_watchdog_doesnt_kill_interpreter(self): """If this test executes at all, it's working! otherwise, the whole testing section will be killed and print out "Alarm clock" """ @timeout(1) def my_func(): raise Exception try: my_func() except Exception: pass time.sleep(1.2) assert True
The RequestTimeTooSkewed error
S3ResponseError: 403 Forbidden <?xml version="1.0" encoding="UTF-8"?> <Error><Code>RequestTimeTooSkewed</Code><Message>The difference between the request time and the current time is too large.</Message><MaxAllowedSkewMilliseconds>900000</MaxAllowedSkewMilliseconds><RequestId>7DDDS67HF8E37</RequestId><HostId>LKE893JFGDLASKJR9BJ-A9NASFPNAOPWEJORG-98DFGJA498JVJ-A04320JF0293JLKE</HostId><RequestTime>Tue, 27 Apr 2010 22:20:58 GMT</RequestTime><ServerTime>2010-04-27T22:55:24Z</ServerTime></Error>
The problem is that the internal "hardware" clock is off the mark. You need to NTPD on that server and re-sync the clock. It is important to skip the use the local hardware clock and adjust time ONLY to the NTP time standard.
This issue comes up most of the time on Windows EC2 Servers. But I have seen CentOS servers that have this issue also.
NTPD or xntpd is the cure. Just add this service to the server with yum and set up the external resources.
Edward M. Goldberg http://myCloudWaatcher.com/ e.m.g.
This is not thread-safe decorator, right?
Edwrrd: We did
apt-get install ntp on our Ubuntu Hardy server and still got the
Alexander: I don't know much about thread safety so I'm not sure.
Hey Eliot, I know this is an old thread, but did you ever figure out the root cause of your _RequestTimeTooSkewed_ errors? I think I'm running into the same issue -- NTPD is installed and the time on the server is correct, but requests will randomly hang for 15 minutes before returning a 403 error. Subsequent requests work immediately.
I'll probably end up implementing a similar fix, but I'd like to fix the source of the problem if I can...
Hi Kevin, I don't think I ever did find the root cause. If I did, I don't remember now. Sorry.
I have sort of run into related issue with boto and Google Storage and I seem to be able to use
import socket socket.setdefaulttimeout(30)
It sets the timeout on reads/writes to newly created sockets. Currently the only advantage of using this function over the alarm signal approach that I can think of is the ability to use it in a multithreaded environment (@Alexander signals and threads don't match well).