Deferreds

Motivation

Dealing with Blocking Code

When coding I/O based programs - networking code, databases, file access - there are many APIs that are blocking, and many methods where the common idiom is to block until a result is gotten.

class Getter:

    def getData(self, x):
        self.blockUntilResult(x)
        return result

g = Getter()
print g.getData(3)

Don't Call Us, We'll Call You

Unfortunately, Twisted can not support blocking calls in most of its code, since it is single threaded, and event based. The solution for this issue is to refactor the code, so that instead of blocking until data is available, we return immediately, and use a callback to notify the requester once the data eventuall arrives. However, this is harder to use, doesn't deal with error raised while waiting for the data, and makes connecting such classes together rather difficult. Nonetheless, looking at how this is implemented will help us understand Deferreds.

class Getter:

    def getData(self, x, callback):
        self.callback = callback
        # this call does not block, it ensure self.gotResult is called
        # when we have the result
        self.onResult(x, self.gotResult)
    
    def gotResult(self, result):
        self.callback(result)

def gotData(d):
    print d

g = Getter()
g.getData(3, gotData)

# run the program's event loop here
from twisted.internet import reactor
reactor.run()

High Level Overview

Deferreds

A twisted.internet.defer.Deferred is a promise that a function will at some point have a result. We can attach callback functions to a Deferred, and once it gets a result these callbacks will be called. In addition Deferreds allow the developer to register a callback for an error, with the default behavior of logging the error. This is an asynchronous equivalent of the common idiom of blocking until a result is returned or an exception it raised.

As we said, multiple callbacks can be added to a Deferred. The first callback in the Deferred's callback chain will be called with the result, the second with the result of the first callback, and so on. Why do we need this? Well, consider a Deferred returned by twisted.enterprise.adbapi - the result of a SQL query. A web widget might add a callback that converts this result into HTML, and pass the Deferred onwards, where the callback will be used by twisted to return the result to the HTTP client.

In order for a Deferred to pass its results to a register callback function, it needs to be armed. Frameworks that support Deferreds - twisted.web.widgets, twisted.web.xmlrpc, twisted.spread.pb server-side objects - will arm the Deferred automatically for you.

import sys
from twisted.internet import defer

class Getter:

    def getResult(self, x):
        self.d = defer.Deferred()
        self.doNonblockingStuff(x)
        return self.d
    
    def gotResult(self, result):
        """Called when we get some info from somewhere via the event loop.

        E.g. this may be called because we got a chunk of data off a socket.
        """
        if self.goodResult(result):
            # tell the Deferred that we have a result for it
            self.d.callback(result)
        else:
            # tell the Deferred that we have an error
            self.d.errback("An error has occured.")

def printData(d): sys.stdout.write(d)
def printError(e): sys.stderr.write(e)

g = Getter()
d = g.getResult(3) # notice how this is similar to the blocking version
d.addCallback(printData) # printData will be called when a result is available
d.addErrback(printError) # printError will be called on an error
d.arm()

# run main event loop here
from twisted.internet import reactor
reactor.run()

More about callbacks

You add multiple callbacks to a Deferred:

g = Getter()
d = g.getResult(3)
d.addCallback(processResult)
d.addCallback(printResult)
d.arm()

Each callback feeds its return value into the next callback (callbacks will be called in the order you add them). Thus in the previous example, processResult's return value will be passed to printResult, instead of the value initially passed into the callback. This gives you a flexible way to chain results together, possibly modifying values along the way, (for example, you may wish to pre-processe database query results).

More about errbacks

Deferred's error handling is modelled after Python's exception handling. In the case that no errors occur, all the callbacks run, one after the other, as described above.

If the errback is called instead of the callback (e.g. because a DB query raised an error), then a twisted.python.failure.Failure is passed into the first errback (you can add multiple errbacks, just like with callbacks). You can think of your errbacks as being like except blocks of ordinary Python code.

Unless you explicitly raise an error in except block, the Exception is caught and stops propagating, and normal execution continues. The same thing happens with errbacks: unless you explicitly return a Failure or (re-)raise an exception, the error stops propagating, and normal callbacks continue executing from that point (using the value returned from the errback). If the errback does returns a Failure or raise an exception, then that is passed to the next errback, and so on.

Note: If an errback doesn't return anything, then it effectively returns None, meaning that callbacks will continue to be executed after this errback. This may not be what you expect to happen, so be careful. Make sure your errbacks return a Failure (probably the one that was passed to it), or a meaningful return value for the next callback.

Also, twisted.python.failure.Failure instances have a useful method called trap, allowing you to effectively do the equivalent of:

try:
    # code that may throw an exception
    cookSpamAndEggs()
except (SpamException, EggException):
    # Handle SpamExceptions and EggExceptions

You do this by:

def errorHandler(failure):
    failure.trap(SpamException, EggException)
    # Handle SpamExceptions and EggExceptions

d.addCallback(cookSpamAndEggs)
d.addErrback(errorHandler)

If none of arguments passed to failure.trap match the error encapsulated in that Failure, then it reraises the error.

Note: There's another potential "gotcha" here. There's a convenience method twisted.internet.defer.Deferred.addCallbacks which is similar to, but not exactly the same as, addCallback followed by addErrback. In particular, consider these two cases:

# Case 1
d = getDeferredFromSomewhere()
d.addCallback(callback1)
d.addErrback(errback1)
d.addCallback(callback2)
d.addErrback(errback1)

# Case 2
d = getDeferredFromSomewhere()
d.addCallbacks(callback1, errback1)
d.addCallbacks(callback2, errback2)

If an error occurs in callback1, then for Case 1 errback1 will be called with the failure. For Case 2, errback2 will be called. Be careful with your callbacks and errbacks.