Caveats in Python
Besides understanding the main features of the language, being able to write idiomatic code is also about being aware of the potential problems of some idioms, and how to avoid them. In this section, we will explore common issues that might cause you long debugging sessions if they catch you off guard.
Most of the points discussed in this section are things to avoid entirely, and I will dare to say that there is almost no possible scenario that justifies the presence of the anti-pattern (or idiom, in this case). Therefore, if you find this on the code base you are working on, feel free to refactor it in the way that is suggested. If you find these traits while doing a code review, this is a clear indication that something needs to change.
Mutable default arguments
Simply put, don't use mutable objects as the default arguments of functions. If you use mutable objects as default arguments, you will get results that are not the expected ones.
Consider the following erroneous function definition:
def wrong_user_display(user_metadata: dict = {"name": "John", "age": 30}):
name = user_metadata.pop("name")
age = user_metadata.pop("age")
return f"{name} ({age})"
This has two problems, actually. Besides the default mutable argument, the body of the function is mutating a mutable object, and hence creating a side effect. But the main problem is the default argument for user_metadata
.
This will actually only work the first time it is called without arguments. For the second time, we call it without explicitly passing something to user_metadata
. It will fail with a KeyError
, like so:
>>> wrong_user_display()
'John (30)'
>>> wrong_user_display({"name": "Jane", "age": 25})
'Jane (25)'
>>> wrong_user_display()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ... in wrong_user_display
name = user_metadata.pop("name")
KeyError: 'name'
The explanation is simple—by assigning the dictionary with the default data to user_metadata
on the definition of the function, this dictionary is actually created once and the user_metadata
variable points to it. When the Python interpreter parses the file, it'll read the function, and find a statement in the signature that creates the dictionary and assigns it to the parameter. From that point on, the dictionary is created only once, and it's the same for the entire life of the program.
Then, the body of the function modifies this object, which remains alive in memory so long as the program is running. When we pass a value to it, this will take the place of the default argument we just created. When we don't want this object, it is called again, and it has been modified since the previous run; the next time we run it, will not contain the keys since they were removed on the previous call.
The fix is also simple—we need to use None
as a default sentinel value and assign the default on the body of the function. Because each function has its own scope and life cycle, user_metadata
will be assigned to the dictionary every time None
appears:
def user_display(user_metadata: dict = None):
user_metadata = user_metadata or {"name": "John", "age": 30}
name = user_metadata.pop("name")
age = user_metadata.pop("age")
return f"{name} ({age})"
Let's conclude the section by understanding the quirks of extending built-in types.
Extending built-in types
The correct way of extending built-in types such as lists, strings, and dictionaries is by means of the collections
module.
If you create a class that directly extends dict
, for example, you will obtain results that are probably not what you are expecting. The reason for this is that in CPython (a C optimization), the methods of the class don't call each other (as they should), so if you override one of them, this will not be reflected by the rest, resulting in unexpected outcomes. For example, you might want to override __getitem__
, and then when you iterate the object with a for
loop, you will notice that the logic you have put on that method is not applied.
This is all solved by using collections.UserDict
, for example, which provides a transparent interface to actual dictionaries, and is more robust.
Let's say we want a list that was originally created from numbers to convert the values to strings, adding a prefix. The first approach might look like it solves the problem, but it is erroneous:
class BadList(list):
def __getitem__(self, index):
value = super().__getitem__(index)
if index % 2 == 0:
prefix = "even"
else:
prefix = "odd"
return f"[{prefix}] {value}"
At first sight, it looks like the object behaves as we want it to. But then, if we try to iterate it (after all, it is a list
), we find that we don't get what we wanted:
>>> bl = BadList((0, 1, 2, 3, 4, 5))
>>> bl[0]
'[even] 0'
>>> bl[1]
'[odd] 1'
>>> "".join(bl)
Traceback (most recent call last):
...
TypeError: sequence item 0: expected str instance, int found
The join
function will try to iterate (run a for
loop over) the list
but expects values of the string
type. We would expect this to work because we modified the __getitem__
method so that it always returns a string
. However, based on the result, we can conclude that our modified version of __getitem__
is not being called.
This issue is actually an implementation detail of CPython, while in other platforms such as PyPy this doesn't happen (see the differences between PyPy and CPython in the references at the end of this chapter).
Regardless of this, we should write code that is portable and compatible with all implementations, so we will fix it by extending not from list
, but UserList
:
from collections import UserList
class GoodList(UserList):
def __getitem__(self, index):
value = super().__getitem__(index)
if index % 2 == 0:
prefix = "even"
else:
prefix = "odd"
return f"[{prefix}] {value}"
And now things look much better:
>>> gl = GoodList((0, 1, 2))
>>> gl[0]
'[even] 0'
>>> gl[1]
'[odd] 1'
>>> "; ".join(gl)
'[even] 0; [odd] 1; [even] 2'
Don't extend directly from dict
; use collections.UserDict
instead. For lists, use collections.UserList
, and for strings, use collections.UserString
.
At this point, we know all the main concepts of Python. Not only how to write idiomatic code that blends well with Python itself, but also to avoid certain pitfalls. The next section is complementary.
Before finishing the chapter, I wanted to give a quick introduction to asynchronous programming, because while it is not strictly related to clean code per se, asynchronous code has become more and more popular, following up with the idea that, in order to work effectively with code, we must be able to read it and understand it, because being able to read asynchronous code is important.