Implementing microservices with Python

Python is an amazingly versatile language.

As you probably already know, it's used to build many different kinds of applications--from simple system scripts that perform tasks on a server to large object-oriented applications that run services for millions of users.

According to a study conducted by Philip Guo in 2014, published on the Association for Computing Machinery (ACM) website, Python has surpassed Java in top U.S. universities, and is the most popular language to learn computer science.

This trend is also true in the software industry. Python sits now in the top five languages in the TIOBE index (http://www.tiobe.com/tiobe-index/), and it's probably even bigger in the web development land, since languages like C are rarely used as main languages to build web applications.

This book makes the assumption that you are already familiar with the Python programming language. If you are not an experienced Python developer, you can read the book Expert Python Programming, Second Edition, where you will learn advanced programming skills in Python.

However, some developers criticize Python for being slow and unfit for building efficient web services. Python is slow, and this is undeniable. But it still is a language of choice for building microservices, and many major companies are happily using it.

This section will give you some background on the different ways you can write microservices using Python, some insights on asynchronous versus synchronous programming, and conclude with some details on Python performances.

This section is composed of five parts:

The WSGI standard
Greenlet and Gevent
Twisted and Tornado
asyncio
Language performances

The WSGI standard

What strikes most web developers who start with Python is how easy it is to get a web application up and running.

The Python web community has created a standard (inspired by the Common Gateway Interface or CGI) called Web Server Gateway Interface (WSGI). It simplifies a lot how you can write a Python application in order to serve HTTP requests.

When your code uses that standard, your project can be executed by standard web servers like Apache or nginx, using WSGI extensions like uwsgi or mod_wsgi.

Your application just has to deal with incoming requests and send back JSON responses, and Python includes all that goodness in its standard library.

You can create a fully functional microservice that returns the server's local time with a vanilla Python module of fewer than 10 lines. It is given as follows:

    import json
    import time 
 
    def application(environ, start_response): 
        headers = [('Content-type', 'application/json')] 
        start_response('200 OK', headers) 
        return [bytes(json.dumps({'time': time.time()}), 'utf8')]

Since its introduction, the WSGI protocol became an essential standard, and the Python web community widely adopted it. Developers wrote middlewares, which are functions you can hook before or after the WSGI application function itself, to do something within the environment.

Some web frameworks, like Bottle (http://bottlepy.org), were created specifically around that standard, and soon enough, every framework out there could be used through WSGI in one way or another.

The biggest problem with WSGI though is its synchronous nature. The application function you saw in the preceding code is called exactly once per incoming request, and when the function returns, it has to send back the response. That means that every time you call the function, it will block until the response is ready.

And writing microservices means your code will have to wait for responses from various network resources all the time. In other words, your application will be idle, and just block the client until everything is ready.

That's an entirely okay behavior for HTTP APIs. We're not talking about building bidirectional applications like web socket-based ones. But what happens when you have several incoming requests that call your application at the same time?

WSGI servers will let you run a pool of threads to serve several requests concurrently. But you can't run thousands of them, and as soon as the pool is exhausted, the next request will block the client's access even if your microservice is doing nothing but idling and waiting for backend services' responses.

That's one of the reasons why non-WSGI frameworks like Twisted and Tornado, and in JavaScript land, Node.js, became very successful--it's fully async.

When you're coding a Twisted application, you can use callbacks to pause and resume the work done to build a response. That means that you can accept new requests and start to treat them. That model dramatically reduces the idling time in your process. It can serve thousands of concurrent requests. Of course, that does not mean the application will return each single response faster. It just means one process can accept more concurrent requests, and juggle between them as the data is getting ready to be sent back.

There's no simple way with the WSGI standard to introduce something similar, and the community has debated for years to come up with a consensus--and failed. The odds are that the community will eventually drop the WSGI standard for something else.

In the meantime, building microservices with synchronous frameworks is still possible and completely fine if your deployments take into account the one request == one thread limitation of the WSGI standard.

There's, however, one trick to boost synchronous web applications--Greenlet, which is explained in the following section.

Greenlet and Gevent

The general principle of asynchronous programming is that the process deals with several concurrent execution contexts to simulate parallelism.

Asynchronous applications use an event loop that pauses and resumes execution contexts when an event is triggered--only one context is active, and they take turns. Explicit instruction in the code will tell the event loop that this is where it can pause the execution.

When that occurs, the process will look for some other pending work to resume. Eventually, the process will come back to your function and continue it where it stopped. Moving from an execution context to another is called switching.

The Greenlet project (https://github.com/python-greenlet/greenlet) is a package based on the Stackless project, a particular CPython implementation, and provides greenlets.

Greenlets are pseudo-threads that are very cheap to instantiate, unlike real threads, and that can be used to call Python functions. Within those functions, you can switch, and give back the control to another function. The switching is done with an event loop, and allows you to write an asynchronous application using a thread-like interface paradigm.

Here's an example from the Greenlet documentation:

    from greenlet import greenlet
    def test1(x, y):
        z = gr2.switch(x+y)
        print(z)
 
    def test2(u): 
        print (u) 
        gr1.switch(42) 
 
    gr1 = greenlet(test1) 
    gr2 = greenlet(test2) 
    gr1.switch("hello", " world")

The two greenlets in the preceding example explicitly switch from one to the other.

For building microservices based on the WSGI standard, if the underlying code uses greenlets, we could accept several concurrent requests, and just switch from one to another when we know a call is going to block the request--like I/O requests.

However, switching from one greenlet to another has to be done explicitly, and the resulting code can quickly become messy and hard to understand. That's where Gevent can become very useful.

The Gevent project (http://www.gevent.org/) is built on top of Greenlet, and offers an implicit and automatic way of switching between greenlets, among many other things.

It provides a cooperative version of the socket module, which uses greenlets to automatically pause and resume the execution when some data is made available in the socket. There's even a monkey patch feature, which automatically replaces the standard library socket with Gevent's version. That makes your standard synchronous code magically asynchronous every time it uses sockets--with just one extra line:

    from gevent import monkey; monkey.patch_all() 
 
    def application(environ, start_response): 
        headers = [('Content-type', 'application/json')] 
        start_response('200 OK', headers) 
        # ...do something with sockets here... 
        return result

This implicit magic comes at a price though. For Gevent to work well, all the underlying code needs to be compatible with the patching that Gevent does. Some packages from the community will continue to block or even have unexpected results because of this--in particular, if they use C extensions, and bypass some of the features of the standard library Gevent patched.

But it works well for most cases. Projects that play well with Gevent are dubbed green, and when a library is not functioning well, and the community asks its authors to make it green, it usually happens.

That's what was used to scale the Firefox Sync service at Mozilla, for instance.

Twisted and Tornado

If you are building microservices where increasing the number of concurrent requests you can hold is important, it's tempting to drop the WSGI standard, and just use an asynchronous framework like Tornado (http://www.tornadoweb.org/) or Twisted (https://twistedmatrix.com/trac/).

Twisted has been around for ages. To implement the same microservices, you need to write a slightly more verbose code like this:

    import time  
    import json
    from twisted.web import server, resource 
    from twisted.internet import reactor, endpoints 
 
    class Simple(resource.Resource): 
        isLeaf = True 
        def render_GET(self, request): 
            request.responseHeaders.addRawHeader(b"content-type", 
                                                 b"application/json") 
            return bytes(json.dumps({'time': time.time()}), 'utf8') 
 
        site = server.Site(Simple()) 
        endpoint = endpoints.TCP4ServerEndpoint(reactor, 8080) 
        endpoint.listen(site) 
        reactor.run()

While Twisted is an extremely robust and efficient framework, it suffers from a few problems when building HTTP microservices, which are as follows:

You need to implement each endpoint in your microservice with a class derived from a Resource class, and that implements each supported method. For a few simple APIs, it adds a lot of boilerplate code.
Twisted code can be hard to understand and debug due to its asynchronous nature.
It's easy to fall into callback hell when you chain too many functions that get triggered successively one after the other--and the code can get messy.
Properly testing your Twisted application is hard, and you have to use a Twisted-specific unit testing model.

Tornado is based on a similar model, but does a better job in some areas. It has a lighter routing system, and does everything possible to make the code closer to plain Python. Tornado also uses a callback model, so debugging can be hard.

But both frameworks are working hard at bridging the gap to rely on the new async features introduced in Python 3.

asyncio

When Guido van Rossum started to work on adding async features in Python 3, part of the community pushed for a Gevent-like solution, because it made a lot of sense to write applications in a synchronous, sequential fashion rather than having to add explicit callbacks like in Tornado or Twisted.

But Guido picked the explicit technique, and experimented in a project called Tulip inspired by Twisted. Eventually, the asyncio module was born out of that side project and added into Python.

In hindsight, implementing an explicit event loop mechanism in Python instead of going the Gevent way makes a lot of sense. The way the Python core developers coded asyncio, and how they elegantly extended the language with the async and await keywords to implement coroutines, made asynchronous applications built with vanilla Python 3.5+ code look very elegant and close to synchronous programming.

Coroutines are functions that can suspend and resume their execution. Chapter 12, What Next?, explains in detail how they are implemented in Python and how to use them.

By doing this, Python did a great job at avoiding the callback syntax mess we sometimes see in Node.js or Twisted (Python 2) applications.

And beyond coroutines, Python 3 has introduced a full set of features and helpers in the asyncio package to build asynchronous applications, refer to https://docs.python.org/3/library/asyncio.html.

Python is now as expressive as languages like Lua to create coroutine-based applications, and there are now a few emerging frameworks that have embraced those features, and will only work with Python 3.5+ to benefit from this.

KeepSafe's aiohttp (http://aiohttp.readthedocs.io) is one of them, and building the same microservice, fully asynchronous, with it would simply need these few elegant lines:

    from aiohttp import web  
    import time 
 
    async def handle(request): 
        return web.json_response({'time': time.time()}) 
 
    if __name__ == '__main__': 
        app = web.Application() 
        app.router.add_get('/', handle) 
        web.run_app(app)

In this small example, we're very close to how we would implement a synchronous app. The only hint we're using async is the async keyword, which marks the handle function as being a coroutine.

And that's what's going to be used at every level of an async Python app going forward. Here's another example using aiopg, a PostgreSQL library for asyncio from the project documentation:

    import asyncio 
    import aiopg 
 
    dsn = 'dbname=aiopg user=aiopg password=passwd host=127.0.0.1' 
 
    async def go(): 
        pool = await aiopg.create_pool(dsn) 
        async with pool.acquire() as conn: 
            async with conn.cursor() as cur: 
                await cur.execute("SELECT 1") 
                ret = [] 
                async for row in cur: 
                    ret.append(row) 
                assert ret == [(1,)] 
 
    loop = asyncio.get_event_loop() 
    loop.run_until_complete(go())

With a few async and await prefixes, the function that performs an SQL query and sends back the result looks a lot like a synchronous function.

But asynchronous frameworks and libraries based on Python 3 are still emerging, and if you are using asyncio or a framework like aiohttp, you will need to stick with particular asynchronous implementations for each feature you need.

If you need to use a library that is not asynchronous in your code, to use it from your asynchronous code means that you will need to go through some extra and challenging work if you want to prevent blocking the event loop.

If your microservices deal with a limited number of resources, it could be manageable. But it's probably a safer bet at the time of this writing to stick with a synchronous framework that's been around for a while rather than an asynchronous one. Let's enjoy the existing ecosystem of mature packages, and wait until the asyncio ecosystem gets more sophisticated.

And there are many great synchronous frameworks to build microservices with Python, like Bottle, Pyramid with Cornice, or Flask.

There are good chances that the second edition of this book will use an asynchronous framework. But for this edition, we'll use the Flask framework throughout the book. It's been around for some time, and is very robust and mature. However, keep in mind that whatever Python web framework you use, you should be able to transpose all the examples in this book. This is because most of the coding involved when building microservices is very close to plain Python, and the framework is mostly to route the requests and offer a few helpers.

Language performances

In the previous sections, we've been through the two different ways to write microservices: asynchronous versus synchronous, and whatever technique you use, the speed of Python directly impacts the performance of your microservice.

Of course, everyone knows Python is slower than Java or Go, but execution speed is not always the top priority. A microservice is often a thin layer of code that sits most of its life waiting for some network responses from other services. Its core speed is usually less important than how fast your SQL queries will take to return from your Postgres server, because the latter will represent most of the time spent to build the response.

But wanting an application that's as fast as possible is legitimate.

One controversial topic in the Python community around speeding up the language is how the Global Interpreter Lock (GIL) mutex can ruin performances, because multi-threaded applications cannot use several processes.

The GIL has good reasons to exist. It protects non-thread-safe parts of the CPython interpreter, and exists in other languages like Ruby. And all attempts to remove it so far have failed to produce a faster CPython implementation.

Larry Hasting is working on a GIL-free CPython project called Gilectomy (https://github.com/larryhastings/gilectomy). Its minimal goal is to come up with a GIL-free implementation, which can run a single-threaded application as fast as CPython. As of the time of this writing, this implementation is still slower that CPython. But it's interesting to follow this work, and see if it reaches speed parity one day. That would make a GIL-free CPython very appealing.

For microservices, besides preventing the usage of multiple cores in the same process, the GIL will slightly degrade performances on high load because of the system calls overhead introduced by the mutex.

However, all the scrutiny around the GIL has been beneficial: work has been done in the past years to reduce GIL contention in the interpreter, and in some areas, Python's performance has improved a lot.

Bear in mind that even if the core team removes the GIL, Python is an interpreted and garbage collected language and suffers performance penalties for those properties.

Python provides the dis module if you are interested to see how the interpreter decomposes a function. In the following example, the interpreter will decompose a simple function that yields incremented values from a sequence in no less than 29 steps:

    >>> def myfunc(data): 
    ...     for value in data: 
    ...         yield value + 1 
    ... 
    >>> import dis 
    >>> dis.dis(myfunc) 
      2           0 SETUP_LOOP              23 (to 26) 
                  3 LOAD_FAST                0 (data) 
                  6 GET_ITER 
            >>    7 FOR_ITER                15 (to 25) 
                  10 STORE_FAST              1 (value) 
 
      3         13 LOAD_FAST                 1 (value) 
                16 LOAD_CONST                1 (1) 
                19 BINARY_ADD 
                20 YIELD_VALUE 
                21 POP_TOP 
                22 JUMP_ABSOLUTE        7 
          >>    25 POP_BLOCK 
          >>    26 LOAD_CONST                0 (None) 
                29 RETURN_VALUE

A similar function written in a statically compiled language will dramatically reduce the number of operations required to produce the same result. There are ways to speed up Python execution, though.

One is to write a part of your code into compiled code by building C extensions, or using a static extension of the language like Cython (http://cython.org/), but that makes your code more complicated.

Another solution, which is the most promising one, is by simply running your application using the PyPy interpreter (http://pypy.org/).

PyPy implements a Just-In-Time (JIT) compiler. This compiler directly replaces, at runtime, pieces of Python with machine code that can be directly used by the CPU. The whole trick for the JIT is to detect in real time, ahead of the execution, when and how to do it.

Even if PyPy is always a few Python versions behind CPython, it has reached a point where you can use it in production, and its performances can be quite amazing. In one of our projects at Mozilla that needs fast execution, the PyPy version was almost as fast as the Go version, and we've decided to use Python there instead.

The Pypy Speed Center website is a great place to look at how PyPy compares to CPython ( http://speed.pypy.org/).

However, if your program uses C extensions, you will need to recompile them for PyPy, and that can be a problem. In particular, if other developers maintain some of the extensions you are using.

But if you build your microservice with a standard set of libraries, chances are that it will work out of the box with the PyPy interpreter, so that's worth a try.

In any case, for most projects, the benefits of Python and its ecosystem largely surpass the performance issues described in this section, because the overhead in a microservice is rarely a problem. And if performance is a problem, the microservice approach allows you to rewrite performance-critical components without affecting the rest of the system.