How (not) to set a timeout on a computation in Python

A common question that comes up in mailing lists and Stack Overflow is how to set a timeout on some function call or computation in Python. When people ask this question, they usually imagine the following scenario: some function their code is calling can run for too long, and they want to make sure this doesn't happen, so after some pre-set timeout the computation should terminate and the program is free to do something else. Oh, and this should work on all platforms, of course.

It turns out this seemingly simple task is hard to do in Python. Here I want to discuss some solutions commonly proposed, with their drawbacks.

One of the solutions is to use signal.SIGALRM. Apart from the trickiness of using the signal module within multi-threaded applications (read this for more details), there's a big problem - SIGARLM is only supported on Unix platforms. If you need this code to run on Windows, you're out of luck.

Another common "solution" I've seen is the following [1]. I've simplified the code to make the point clearer, ignoring exceptions and other special conditions:

class TimeLimitExpired(Exception): pass

def timelimit(timeout, func, args=(), kwargs={}):
    """ Run func with the given timeout. If func didn't finish running
        within the timeout, raise TimeLimitExpired
    """
    import threading
    class FuncThread(threading.Thread):
        def __init__(self):
            threading.Thread.__init__(self)
            self.result = None

        def run(self):
            self.result = func(*args, **kwargs)

    it = FuncThread()
    it.start()
    it.join(timeout)
    if it.isAlive():
        raise TimeLimitExpired()
    else:
        return it.result

The trick here is to run the function inside a thread and use the timeout argument of Thread.join to implement the time limit. join will return after timeout whether the thread (i.e. the function) stopped running or not. If it's still running (isAlive() returns True) then the time limit exception is raised.

I hope the problem here is obvious. Think about it for a moment - suppose the function didn't finish within the given timeout, what happens to the thread after the exception is raised? Nothing - it just keeps on happily running. If the function call never returns for some reason, we've just made ourselves a "zombie" thread that will continue executing, consuming CPU resources.

What we really need to do is to somehow kill the thread if the timeout expires. Whoops, we're in trouble. Threads can't be killed in Python, and for a good reason.

This is why I was very surprised to find [2] an "improved" version of the approach presented above. Again, the code is simplified to keep only the relevant parts:

class TimeLimitExpired(Exception): pass

def timelimit(timeout, func, args=(), kwargs={}):
    """ Run func with the given timeout. If func didn't finish running
        within the timeout, raise TimeLimitExpired
    """
    import threading
    class FuncThread(threading.Thread):
        def __init__(self):
            threading.Thread.__init__(self)
            self.result = None

        def run(self):
            self.result = func(*args, **kwargs)

        def _stop(self):
            if self.isAlive():
                Thread._Thread__stop(self)

    it = FuncThread()
    it.start()
    it.join(timeout)
    if it.isAlive():
        it._stop()
        raise TimeLimitExpired()
    else:
        return it.result

Whoa - what is that? Thread._Thread__stop is a call to the private, name mangled method __stop of the Thread class, with the apparent hope that this method actually causes a thread to stop. But it doesn't! All it does is set the internal Thread.__stopped flag that allows join to return earlier. You can't kill threads in Python, remember? So this approach is just a fallacy based on the misunderstanding of the internals of Thread [3].

So even more sophisticated "solutions" propose to ditch the Python-level-API and just brutally kill a thread with pthread_kill (on Unix) or TerminateThread (on Windows). This is a very bad idea. Even low-level APIs like pthread, which do provide a means to kill threads, recommend avoiding it. In Python it's even more problematic because of the way the interpreter works. If the thread you kill happened to hold the GIL, you're most likely going to have a deadlock.

Other non-solutions include using sys.settrace in the thread. In addition to making the thread code horribly slow, this will also fail to work when the thread calls into C functions. The same is true for approaches attempting to raise an exception in another thread - the exception will get ignored if the thread is busy inside some C call.

This is where many people give up on threads and suggest using sub-processes instead. However, a process is not as lightweight as a thread, and if you need to run many functions with a timeout (or run a single function often) you have to be aware of the costs of creating and destroying a child process each time. Besides, if the sub-process has access to some shared resources, many of the troubles of threads surface here too.

In general, though, with some care sub-processes can be made to work. The multiprocessing package can even make processes as simple to use as threads, exposing similar APIs to threading. Additionally, it provides the multiprocessing.Pool class that can help lower the costs of process creation and destruction - assuming that the function we want to timeout does terminate most of the time before the timeout is reached.

Another reasonable solution is to make the computation cooperative, i.e. call back on the invoking code occasionally asking if it's time to finish. This is a technique well familiar to GUI programmers, where a function invoked from the GUI main loop should not run for too long, and should break its work to chunks.

An additional aspect to consider is that often long-running computations involve IO such as sockets. In this case, if a timeout is required, it's recommended to use asynchronous IO which naturally supports interruptions. Unfortunately, asynchronous IO also makes code more convoluted and difficult to write. Frameworks exist to alleviate this burden - the best known for Python is probably Twisted. Take a look at it - it's a bag full of solutions for your IO problems.

So what we've seen here is a relatively simple problem, which unfortunately has no really simple solution in Python. The blame here is on the problem, not the language, however. Even in languages that do allow killing threads (for example, C with native OS APIs), this is a discouraged practice - fickle and hard to get exactly right.

[1]	http://code.activestate.com/recipes/473878-timeout-function-using-threading/

[2]	http://code.activestate.com/recipes/576780-timeout-for-nearly-any-callable/

[3]	If you're skeptical, make `func` a simple endless loop that prints something out every once in a while. You'll note that this keeps getting printed even after the timeout has expired and the thread was "stopped". Checking `threading.active_count()` is another telling clue.