As part of Packt’s Python Week, Daniel Arbuckle, author of our Mastering Python video, explains Python’s journey from generators, to schedulers, to cooperative multithreading and beyond….
My romance with cooperative multithreading in Python began in the December of 2001, with the release of Python version 2.2. That version of Python contained a fascinating new feature: generators. Fourteen years later, generators are old hat to Python programmers, but at the time, they represented a big conceptual improvement.
While I was playing with generators, I noticed that they were in essence first-class objects that represented both the code and state of a function call. On top of that, the code could pause an execution and then later resume it. That meant that generators were practically coroutines! I immediately set out to write a scheduler to execute generators as lightweight threads.
I wasn’t the only one!
While the schedulers that I and others wrote worked, there were some significant limitations imposed on them by the language. For example, back then generators didn’t have a send() method, so it was necessary to come up with some other way of getting data from one generator to another. My scheduler got set aside in favor of more productive projects. Fortunately, that’s not where the story ends.
With Python 2.5, Guido van Rossum and Phillip J. Eby added the send() method to generators, turned yield into an expression (it had been a statement before), and made several other changes that made it easier and more practical to treat generators as coroutines, and combine them into cooperatively scheduled threads.
Python 3.3 was changed to include yield from expressions, which didn’t make much of a difference to end users of cooperative coroutine scheduling, but made the internals of the schedulers dramatically simpler.
The next step in the story is Python 3.4, which included the asyncio coroutine scheduler and asynchronous I/O package in the Python standard library. Cooperative multithreading wasn’t just a clever trick anymore. It was a tool in everyone’s box.
All of which brings us to the present, and the recent release of Python 3.5, which includes an explicit coroutine type, distinct from generators, new asynchronous async def, async for, and async with statements, and an await expression that takes the place of yield from for coroutines.
So, why does Python need explicit coroutines and new syntax, if generator-based coroutines had gotten good enough for inclusion in the standard library? The short answer is that generators are primarily for iteration, so using them for something else — no matter how well it works conceptually — introduces ambiguities. For example, if you hand a generator to Python’s for loop, it’s not going to treat it as a coroutine, it’s going to treat it as an iterable.
There’s another problem, related to Python’s special protocol methods, such as __enter__ and __exit__, which are called by the code in the Python interpreter, leaving the programmer with no opportunity to yield from it. That meant that generator-based coroutines were not compatible with various important bits of Python syntax, such as the with statement. A coroutine couldn’t be called from anything that was called by a special method, whether directly or indirectly, nor was it possible to wait on a future value.
The new changes to Python are meant to address these problems. So, what exactly are these changes?
async def is used to define a coroutine. Apart from the async keyword, the syntax is almost identical to a normal def statement.
The big differences are, first, that coroutines can contain await, async for, and async with syntaxes, and, second, they are not generators, so they’re not allowed to contain yield or yield from expressions. It’s impossible for a single function to be both a generator and a coroutine.
await is used to pause the current coroutine until the requested coroutine returns. In other words, an await expression is just like a function call, except that the called function can pause, allowing the scheduler to run some other thread for a while. If you try to use an await expression outside of an async def, Python will raise a SyntaxError.
async with is used to interface with an asynchronous context manager, which is just like a normal context manager except that instead of __enter__ and __exit__ methods, it has __aenter__ and __aexit__ coroutine methods. Because they’re coroutines, these methods can do things like wait for data to come in over the network, without blocking the whole program.
async for is used to get data from an asynchronous iterable.
Asynchronous iterables have an __aiter__ coroutine method, which functions like the normal __iter__ method, but can participate in coroutine scheduling and asynchronous I/O. __aiter__ should return an object with an __anext__ coroutine method, which can participate in coroutine scheduling and asynchronous I/O before returning the next iterated value, or raising StopAsyncIteration.
This is Python, all of these new features, and the convenience they represent, are 100% compatible with the existing asyncio scheduler.
Further, as long as you use the @asyncio.coroutine decorator, your existing asyncio code is also forward compatible with these features without any overhead.