Inversion of control and dependency injection
Inversion of Control (IoC) is a simple property of some software designs. According to Wiktionary, if a design exhibits IoC, it means that:
(…) the flow of control in a system is inverted in comparison to the traditional architecture.
But what is the traditional architecture? IoC isn't a new idea, and we can trace it back to at least David D. Clark's paper from 1985 titled The structuring of systems using of upcalls. It means that traditional design probably refers to the design of software that was common or thought to be traditional in the 1980s.
You can access Clark's full paper in a digitalized form at https://groups.csail.mit.edu/ana/Publications/PubPDFs/The%20Structuring%20of%20Systems%20Using%20Upcalls.pdf.
Clark describes the traditional architecture of a program as a layered structure of procedures where control always goes from top to bottom. Higher-level layers invoke procedures from lower layers.
Those invoked procedures gain control and can invoke even deeper-layered procedures before returning control upward. In practice, control is traditionally passed from application to library functions. Library functions may pass it deeper to even lower-level libraries but, eventually, return it back to the application.
IoC happens when a library passes control up to the application so that the application can take part in the library behavior. To better understand this concept, consider the following trivial example of sorting a list of integer numbers:
sorted([1,2,3,4,5,6])
The built-in sorted()
function takes an iterable of items and returns a list of sorted items. Control goes from the caller (your application) directly to the sorted()
function. When the sorted()
function is done with sorting, it simply returns the sorted result and gives control back to the caller. Nothing special.
Now let's say we want to sort our numbers in a quite unusual way. That could be, for instance, sorting them by the absolute distance from number 3. Integers closest to 3 should be at the beginning of the result list and the farthest should be at the end. We can do that by defining a simple key function that will specify the order key of our elements:
def distance_from_3(item):
return abs(item - 3)
Now we can pass that function as the callback key
argument to the sorted()
function:
sorted([1,2,3,4,5,6], key=distance_from_3)
What will happen now is the sorted()
function will invoke the key
function on every element of the iterable argument. Instead of comparing item values, it will now compare the return values of the key
function. Here is where IoC happens. The sorted()
function "upcalls" back to the distance_from_3()
function provided by the application as an argument. Now it is a library that calls the functions from the application, and thus the flow of control is reversed.
Callback-based IoC is also humorously referred to as the Hollywood principle in reference to the "don't call us, we'll call you" phrase.
Note that IoC is just a property of a design and not a design pattern by itself. An example with the sorted()
function is the simplest example of callback-based IoC but it can take many different forms. For instance:
- Polymorphism: When a custom class inherits from a base class and base methods are supposed to call custom methods
- Argument passing: When the receiving function is supposed to call methods of the supplied object
- Decorators: When a decorator function calls a decorated function
- Closures: When a nested function calls a function outside of its scope
As you see, IoC is a rather common aspect of object-oriented or functional programming paradigms. And it also happens quite often without you even realizing it. While it isn't a design pattern by itself, it is a key ingredient of many actual design patterns, paradigms, and methodologies. The most notable one is dependency injection, which we will discuss later in this chapter.
Clark's traditional flow of control in procedural programming also happens in object-oriented programming. In object-oriented programs, objects themselves are receivers of control. We can say that control is passed to the object whenever a method of that object is invoked. So the traditional flow of control would require objects to hold full ownership of all dependent objects that are required to fulfill the object's behavior.
Inversion of control in applications
To better illustrate the differences between various flows of control, we will build a small but practical application. It will initially start with a traditional flow of control and later on, we will see if it can benefit from IoC in selected places.
Our use case will be pretty simple and common. We will build a service that can track web page views using so-called tracking pixels and serve page view statistics over an HTTP endpoint. This technique is commonly used in tracking advertisement views or email openings. It can also be useful in situations when you make extensive use of HTTP caching and want to make sure that caching does not affect page view statistics.
Our application will have to track counts of page views in some persistent storage. That will also give us the opportunity to explore application modularity—a characteristic that cannot be implemented without IoC.
What we need to build is a small web backend application that will have two endpoints:
/track
: This endpoint will return an HTTP response with a 1x1 pixel GIF image. Upon request, it will store theReferer
header and increase the number of requests associated with that value./stats
: This endpoint will read the top 10 most commonReferer
values received on thetrack/
endpoint and return an HTTP response containing a summary of the results in JSON format.
The Referer
header is an optional HTTP header that web browsers will use to tell the web server what is the URL of the origin web page from which the resource is being requested. Take note of the misspelling of the word referrer. The header was first standardized in RFC 1945, Hypertext Transfer Protocol—HTTP/1.0 (see https://tools.ietf.org/html/rfc1945). When the misspelling was discovered, it was already too late to fix it.
We've already introduced Flask as a simple web microframework in Chapter 2, Modern Python Development Environments, so we will use it here as well. Let's start by importing some modules and setting up module variables that we will use on the way:
from collections import Counter
from http import HTTPStatus
from flask import Flask, request, Response
app = Flask(__name__)
storage = Counter()
PIXEL = (
b'GIF89a\x01\x00\x01\x00\x80\x00\x00\x00'
b'\x00\x00\xff\xff\xff!\xf9\x04\x01\x00'
b'\x00\x00\x00,\x00\x00\x00\x00\x01\x00'
b'\x01\x00\x00\x02\x01D\x00;'
)
The app
variable is the core object of the Flask framework. It represents a Flask web application. We will use it later to register endpoint routes and also run the application development server.
The storage
variable holds a Counter
instance. It is a convenient data structure from the Standard Library that allows you to track counters of any immutable values. Our ultimate goal is to store page view statistics in a persistent way, but it will be a lot easier to start off with something simpler. That's why we will initially use this variable as our in-memory storage of page view statistics.
Last but not least, is the PIXEL
variable. It holds a byte representation of a 1x1 transparent GIF image. The actual visual appearance of the tracking pixel does not matter and probably will never change. It is also so small that there's no need to bother with loading it from the filesystem. That's why we are inlining it in our module to fit the whole application in a single Python module.
Once we're set, we can write code for the /track
endpoint handler:
@app.route('/track')
def track():
try:
referer = request.headers["Referer"]
except KeyError:
return Response(status=HTTPStatus.BAD_REQUEST)
storage[referer] += 1
return Response(
PIXEL, headers={
"Content-Type": "image/gif",
"Expires": "Mon, 01 Jan 1990 00:00:00 GMT",
"Cache-Control": "no-cache, no-store, must-revalidate",
"Pragma": "no-cache",
}
)
We use extra Expires
, Cache-Control
, and Pragma
headers to control the HTTP caching mechanism. We set them so that they would disable any form of caching on most web browser implementations. We also do it in a way that should disable caching by potential proxies. Take careful note of the Expires
header value that is way in the past. This is the lowest possible epoch time and in practice means that resource is always considered expired.
Flask request handlers typically start with the @app.route(route)
decorator that registers the following handler function for the given HTTP route. Request handlers are also known as views. Here we have registered the track()
view as a handler of the /track
route endpoint. This is the first occurrence of IoC in our application: we register our own handler implementation within Flask frameworks. It is a framework that will call back our handlers on incoming requests that match associated routes.
After the signature, we have simple code for handling the request. We check if the incoming request has the expected Referer
header. That's the value which the browser uses to tell what URI the requested resource was included on (for instance, the HTML page we want to track). If there's no such header, we will return an error response with a 400 Bad Request
HTTP status code.
If the incoming request has the Referer
header, we will increase the counter value in the storage
variable. The Counter
structure has a dict
-like interface and allows you to easily modify counter values for keys that haven't been registered yet. In such a case, it will assume that the initial value for the given key was 0. That way we don't need to check whether a specific Referer
value was already seen and that greatly simplifies the code. After increasing the counter value, we return a pixel response that can be finally displayed by the browser.
Note that although the storage
variable is defined outside the track()
function, it is not yet an example of IoC. That's because whoever calls the stats()
function can't replace the implementation of the storage. We will try to change that in the next iterations of our application.
The code for the /stats
endpoint is even simpler:
@app.route('/stats')
def stats():
return dict(storage.most_common(10))
In the stats()
view, we again take advantage of the convenient interface of the Counter
object. It provides the most_common(n)
method, which returns up to n
most common key-value pairs stored in the structure. We immediately convert that to a dictionary. We don't use the Response
class, as Flask by default serializes the non-Response
class return values to JSON and assumes a 200 OK
status for the HTTP response.
In order to test our application easily, we finish our script with the simple invocation of the built-in development server:
if __name__ == '__main__':
app.run(host="0.0.0.0", port=8000)
If you store the application in the tracking.py
file, you will be able to start the server using the python tracking.py
command. It will start listening on port 8000
. If you would like to test the application in your own browser, you can extend it with the following endpoint handler:
@app.route('/test')
def test():
return """
<html>
<head></head>
<body><img src="/track"></body>
</html>
"""
If you open the address http://localhost:8000/test several times in your web browser and then go to http://localhost:8000/stats, you will see output similar to the following:
{"http://localhost:8000/test":6}
The problem with the current implementation is that it stores request counters in memory. Whenever the application is restarted, the existing counters will be reset and we'll lose important data. In order to keep the data between restarts, we will have to replace our storage implementation.
The options to provide data persistency are many. We could, for instance, use:
- A simple text file
- The built-in
shelve
module - A relational database management system (RDBMS) like MySQL, MariaDB, or PostgreSQL
- An in-memory key-value or data struct storage service like Memcached or Redis
Depending on the context and scale of the workload our application needs to handle, the best solution will be different. If we don't know yet what is the best solution, we can also make the storage pluggable so we can switch storage backends depending on the actual user needs. To do so, we will have to invert the flow of control in our track()
and stats()
functions.
Good design dictates the preparation of some sort of definition of the interface of the object that is responsible for the IoC. The interface of the Counter
class seems like a good starting point. It is convenient to use. The only problem is that the +=
operation can be implemented through either the __add__()
or __iadd__()
special method. We definitely want to avoid such ambiguity. Also, the Counter
class has way too many extra methods and we need only two:
- A method that allows you to increase the counter value by one
- A method that allows you to retrieve the 10 most often requested keys
To keep things simple, and readable, we will define our views storage interface as an abstract base class of the following form:
from abc import ABC, abstractmethod
from typing import Dict
class ViewsStorageBackend(ABC):
@abstractmethod
def increment(self, key: str): ...
@abstractmethod
def most_common(self, n: int): Dict[str, int] ...
From now on, we can provide various implementations of the views storage backend. The following will be the implementation that adapts the previously used Counter
class into the ViewsStorageBackend
interface:
from collections import Counter
from typing import Dict
from .tracking_abc import ViewsStorageBackend
class CounterBackend(ViewsStorageBackend):
def __init__(self):
self._counter = Counter()
def increment(self, key: str):
self._counter[key] += 1
def most_common(self, n: int) -> Dict[str, int]:
return dict(self._counter.most_common(n))
If we would like to provide persistency through the Redis in-memory storage service, we could do so by implementing a new storage backend as follows:
from typing import Dict
from redis import Redis
class RedisBackend(ViewsStorageBackend):
def __init__(
self,
redis_client: Redis,
set_name: str
):
self._client = redis_client
self._set_name = set_name
def increment(self, key: str):
self._client.zincrby(self._set_name, 1, key)
def most_common(self, n: int) -> Dict[str, int]:
return {
key.decode(): int(value)
for key, value in
self._client.zrange(
self._set_name, 0, n-1,
desc=True,
withscores=True,
)
}
Redis is an in-memory data store. This means that by default, data is stored only in memory. Redis will persist data on disk during restart but may lose data in an unexpected crash (for instance, due to a power outage). Still, this is only a default behavior. Redis offers various modes for data persistence, some of which are comparable to other databases. This means Redis is a completely viable storage solution for our simple use case. You can read more about Redis persistence at https://redis.io/topics/persistence.
Both backends have the same interface loosely enforced with an abstract base class. It means instances of both classes can be used interchangeably. The question is, how will we invert control of our track()
and stats()
functions in a way that will allow us to plug in a different views storage implementation?
Let's recall the signatures of our functions:
@app.route('/stats')
def stats():
...
@app.route('/track')
def track():
...
In the Flask framework, the app.route()
decorator registers a function as a specific route handler. You can think of it as a callback for HTTP request paths. You don't call that function manually anymore and Flask is in full control of the arguments passed to it. But we want to be able to easily replace the storage implementation. One way to do that would be through postponing the handler registration and letting our functions receive an extra storage
argument. Consider the following example:
def track(storage: ViewsStorageBackend):
try:
referer = request.headers["Referer"]
except KeyError:
return Response(status=HTTPStatus.BAD_REQUEST)
storage.increment(referer)
return Response(
PIXEL, headers={
"Content-Type": "image/gif",
"Expires": "Mon, 01 Jan 1990 00:00:00 GMT",
"Cache-Control": "no-cache, no-store, must-revalidate",
"Pragma": "no-cache",
}
)
def stats(storage: ViewsStorageBackend):
return storage.most_common(10)
Our extra argument is annotated with the ViewsStorageBackend
type so the type can be easily verified with an IDE or additional tools. Thanks to this we have inverted control of those functions and also achieved better modularity. Now you can easily switch the implementation of storage for different classes with a compatible interface. The extra benefit of IoC is that we can easily unit-test stats()
and track()
methods in isolation from storage implementations.
We will discuss the topic of unit-tests together with detailed examples of tests that leverage IoC in Chapter 10, Testing and Quality Automation.
The only part that is missing is actual route registration. We can no longer use the app.route()
decorator directly on our functions. That's because Flask won't be able to resolve the storage
argument on its own. We can overcome that problem by "pre-injecting" desired storage implementations into handler functions and create new functions that can be easily registered with the app.route()
call.
The simple way to do that would be using the partial()
function from the functools
module. It takes a single function together with a set of arguments and keyword arguments and returns a new function that has selected arguments preconfigured. We can use that approach to prepare various configurations of our service. Here, for instance, is an application configuration that uses Redis as a storage backend:
from functools import partial
if __name__ == '__main__':
views_storage = RedisBackend(Redis(host="redis"), "my-stats")
app.route("/track", endpoint="track")(
partial(track, storage=views_storage))
app.route("/stats", endpoint="stats")(
partial(stats, storage=views_storage))
app.run(host="0.0.0.0", port=8000)
The presented approach can be applied to many other web frameworks as the majority of them have the same route-to-handler structure. It will work especially well for small services with only a handful of endpoints. Unfortunately, it may not scale well in large applications. It is simple to write but definitely not the easiest to read. Seasoned Flask programmers will for sure feel this approach is unnatural and needlessly repetitive. Here, it simply breaks the common convention of writing Flask handler functions.
The ultimate solution would be one that allows you to write and register view functions without the need to manually inject dependent objects. So, for instance:
@app.route('/track')
def track(storage: ViewsStorageBackend):
...
In order to do that, from the Flask framework we would need to:
- Recognize extra arguments as dependencies of views.
- Allow the definition of a default implementation for said dependencies.
- Automatically resolve dependencies and inject them into views at runtime.
Such a mechanism is referred to as dependency injection, which we mentioned previously. Some web frameworks offer a built-in dependency injection mechanism, but in the Python ecosystem, it is a rather rare occurrence. Fortunately, there are plenty of lightweight dependency injection libraries that can be added on top of any Python framework. We will explore such a possibility in the next section.
Using dependency injection frameworks
When IoC is used at a great scale, it can easily become overwhelming. The example from the previous section was quite simple so it didn't require a lot of setup. Unfortunately, we have sacrificed a bit of readability and expressiveness for better modularity and responsibility isolation. For larger applications, this can be a serious problem.
Dedicated dependency injection libraries come to the rescue by combining a simple way to mark function or object dependencies with a runtime dependency resolution. All of that usually can be achieved with minimal impact on the overall code structure.
There are plenty of dependency injection libraries for Python, so definitely there is no need to build your own from scratch. They are often similar in implementation and functionality, so we will simply pick one and see how it could be applied in our view tracking application.
Our library of choice will be the injector
library, which is freely available on PyPI. We will pick it up for several reasons:
- Reasonably active and mature: Developed over more than 10 years with releases every few months.
- Framework support: It has community support for various frameworks including Flask through the
flask-injector
package. - Typing annotation support: It allows writing unobtrusive dependency annotations and leveraging static typing analysis.
- Simple:
injector
has a Pythonic API. It makes code easy to read and to reason about.
You can install injector
in your environment using pip
as follows:
$ pip install injector
You can find more information about injector
at https://github.com/alecthomas/injector.
In our example, we will use the flask-injector
package as it provides some initial boilerplate to integrate injector
with Flask seamlessly. But before we do that, we will first separate our application into several modules that would better simulate a larger application. After all, dependency injection really shines in applications that have multiple components.
We will create the following Python modules:
interfaces
: This will be the module holding our interfaces. It will containViewsStorageBackend
from the previous section without any changes.backends
: This will be the module holding specific implementations of storage backends. It will containCounterBackend
andRedisBackend
from the previous section without any changes.tracking
: This will be the module holding the application setup together with view functions.di
: This will be the module holding definitions for theinjector
library, which will allow it to automatically resolve dependencies.
The core of the injector
library is a Module
class. It defines a so-called dependency injection container—an atomic block of mapping between dependency interfaces and their actual implementation instances. The minimal Module
subclass may look as follows:
from injector import Module, provider
def MyModule(Module):
@provider
def provide_dependency(self, *args) -> Type:
return ...
The @provider
decorator marks a Module
method as a method providing the implementation for a particular Type
interface. The creation of some objects may be complex, so injector
allows modules to have additional nondecorated helper methods.
The method that provides dependency may also have its own dependencies. They are defined as method arguments with type annotations. This allows for cascading dependency resolution. injector
supports composing dependency injection context from multiple modules so there's no need to define all dependencies in a single module.
Using the above template, we can create our first injector module in the di.py
file. It will be CounterModule
, which provides a CounterBackend
implementation for the ViewsStorageBackend
interface. The definition will be as follows:
from injector import Module, provider, singleton
from interfaces import ViewsStorageBackend
from backends import CounterBackend
class CounterModule(Module):
@provider
@singleton
def provide_storage(self) -> ViewsStorageBackend:
return CounterBackend()
CounterStorage
doesn't take any arguments, so we don't have to define extra dependencies. The only difference from the general module template is the @singleton
decorator. It is an explicit implementation of the singleton design pattern. A singleton is simply a class that can have only a single instance. In this context, it means that every time this dependency is resolved, injector
will always return the same object. We need that because CounterStorage
stores view counters under the internal _counter
attribute. Without the @singleton
decorator, every request for the ViewsStorageBackend
implementation would return a completely new object and thus we would constantly lose track of view numbers.
The implementation of RedisModule
will be only slightly more complex:
from injector import Module, provider, singleton
from redis import Redis
from interfaces import ViewsStorageBackend
from backends import RedisBackend
class RedisModule(Module):
@provider
def provide_storage(self, client: Redis) -> ViewsStorageBackend:
return RedisBackend(client, "my-set")
@provider
@singleton
def provide_redis_client(self) -> Redis:
return Redis(host="redis")
The code files for this chapter provide a complete docker-compose
environment with a preconfigured Redis Docker image so you don't have to install Redis on your own host.
In the RedisStorage
module, we take advantage of the injector
library's ability to resolve cascading dependencies. The RedisBackend
constructor requires a Redis client instance so we can treat it as another provide_storage()
method argument. injector
will recognize typing annotation and automatically match the method that provides the Redis
class instance. We could go even further and extract a host argument to separate configuration dependency. We won't do that for the sake of simplicity.
Now we have to tie everything up in the tracking
module. We will be relying on injector
to resolve dependencies on views. This means that we can finally define track()
and stats()
handlers with extra storage
arguments and register them with the @app.route()
decorator as if they were normal Flask views. Updated signatures will be the following:
@app.route('/stats')
def stats(storage: ViewsStorageBackend):
...
@app.route('/track')
def track(storage: ViewsStorageBackend):
...
What is left is the final configuration of the app that designates which modules should be used to provide interface implementations. If we would like to use RedisBackend
, we would finish our tracking
module with the following code:
import di
if __name__ == '__main__':
FlaskInjector(app=app, modules=[di.RedisModule()])
app.run(host="0.0.0.0", port=8000)
The following is the complete code of the tracking
module:
from http import HTTPStatus
from flask import Flask, request, Response
from flask_injector import FlaskInjector
from interfaces import ViewsStorageBackend
import di
app = Flask(__name__)
PIXEL = (
b'GIF89a\x01\x00\x01\x00\x80\x00\x00\x00'
b'\x00\x00\xff\xff\xff!\xf9\x04\x01\x00'
b'\x00\x00\x00,\x00\x00\x00\x00\x01\x00'
b'\x01\x00\x00\x02\x01D\x00;'
)
@app.route('/track')
def track(storage: ViewsStorageBackend):
try:
referer = request.headers["Referer"]
except KeyError:
return Response(status=HTTPStatus.BAD_REQUEST)
storage.increment(referer)
return Response(
PIXEL, headers={
"Content-Type": "image/gif",
"Expires": "Mon, 01 Jan 1990 00:00:00 GMT",
"Cache-Control": "no-cache, no-store, must-revalidate",
"Pragma": "no-cache",
}
)
@app.route('/stats')
def stats(storage: ViewsStorageBackend):
return storage.most_common(10)
@app.route("/test")
def test():
return """
<html>
<head></head>
<body><img src="/track"></body>
</html>
"""
if __name__ == '__main__':
FlaskInjector(app=app, modules=[di.RedisModule()])
app.run(host="0.0.0.0", port=8000)
As you can see, the introduction of the dependency injection mechanism didn't change the core of our application a lot. The preceding code closely resembles the first and simplest iteration, which didn't have the IoC mechanism. At the cost of a few interface and injector module definitions, we've got scaffolding for a modular application that could easily grow into something much bigger. We could, for instance, extend it with additional storage that would serve more analytical purposes or provide a dashboard that allows you to view the data at different angles.
Another advantage of dependency injection is loose coupling. In our example, views never create instances of storage backends nor their underlying service clients (in the case of RedisBackend
). They depend on shared interfaces but are independent of implementations. Loose coupling is usually a good foundation for a well-architected application.
It is of course hard to show the utility of IoC and dependency injection in a really concise example like the one we've just seen. That's because these techniques really shine in big applications. Anyway, we will revisit the use case of the pixel tracking application in Chapter 10, Testing and Quality Automation, where we will show that IoC greatly improves the testability of your code.