All of the properties and functions of an object are public
in Python, which is different from other languages where properties can be public
, private
, or protected
. That is, there is no point in preventing caller objects from invoking any attributes an object has. This is another difference compared to other programming languages in which you can mark some attributes as private
or protected
.
There is no strict enforcement, but there are some conventions. An attribute that starts with an underscore is meant to be private
to that object, and we expect that no external agent calls it (but again, nothing is preventing this).
Before jumping into the details of properties
, it's worth mentioning some traits of underscores in Python, understanding the convention, and the scope of attributes.
Underscores in Python
There are some conventions and implementation details that make use of underscores in Python, which is an interesting topic that's worthy of analysis.
Like we mentioned previously, by default, all attributes of an object are public
. Consider the following example to illustrate this:
> >> class Connector:
... def __init__(self, source):
... self.source = source
... self._timeout = 60
...
> >> conn = Connector("postgresql://localhost")
> >> conn.source
'postgresql://localhost'
> >> conn._timeout
60
> >> conn.__dict__
{'source': 'postgresql://localhost', '_timeout': 60}
Here, a Connector
object is created with source
, and it starts with two attributes—the aforementioned source
and timeout
. The former is public
and the latter private
. However, as we can see from the following lines when we create an object like this, we can actually access both of them.
The interpretation of this code is that _timeout
should be accessed only within connector
itself and never from a caller. This means that you should organize the code in a way so that you can safely refactor the timeout at all of the times it's needed, relying on the fact that it's not being called from outside the object (only internally), hence preserving the same interface as before. Complying with these rules makes the code easier to maintain and more robust because we don't have to worry about ripple effects when refactoring the code if we maintain the interface of the object. The same principle applies to methods as well.
Classes should only expose those attributes and methods that are relevant to an external caller object, namely, entailing its interface. Everything that is not strictly part of an object's interface should be kept prefixed with a single underscore.
Attributes that start with an underscore must be respected as private
and not be called externally. On the other hand, as an exception to this rule, we could say that in unit tests, it might be allowed to access internal attributes if this makes things easier to test (but note that adhering to this pragmatic approach still suffers from the maintainability cost when you decide to refactor the main class). However, keep in mind the following recommendation:
Using too many internal methods and attributes could be a sign that the class has too many tasks and doesn't comply with the single responsibility principle. This could indicate that you need to extract some of its responsibilities into more collaborating classes.
Using a single underscore as prefix is the Pythonic way of clearly delimiting the interface of an object. There is, however, a common misconception that some attributes and methods can be actually made private
. This is, again, a misconception. Let's imagine that now the timeout
attribute is defined with a leading double underscore instead:
> >> class Connector:
... def __init__(self, source):
... self.source = source
... self.__timeout = 60
...
... def connect(self):
... print("connecting with {0}s".format(self.__timeout))
... # ...
...
> >> conn = Connector("postgresql://localhost")
> >> conn.connect()
connecting with 60s
> >> conn.__timeout
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Connector' object has no attribute '__timeout'
Some developers use this method to hide some attributes, thinking, like in this example, that timeout
is now private
and that no other object can modify it. Now, take a look at the exception that is raised when trying to access __timeout
. It's AttributeError
, saying that it doesn't exist. It doesn't say something like "this is private" or "this can't be accessed", and so on. It says it does not exist. This should give us a clue that, in fact, something different is happening and that this behavior is instead just a side effect, but not the real effect we want.
What's actually happening is that with the double underscores, Python creates a different name for the attribute (this is called name mangling). What it does is create the attribute with the following name instead: "_<class-name>__<attribute-name>"
. In this case, an attribute named '_Connector__timeout'
will be created, and this attribute can be accessed (and modified) as follows:
> >> vars(conn)
{'source': 'postgresql://localhost', '_Connector__timeout': 60}
> >> conn._Connector__timeout
60
> >> conn._Connector__timeout = 30
> >> conn.connect()
connecting with 30s
Notice the side effect that we mentioned earlier—the attribute still exists, only with a different name, and for that reason, the AttributeError
was raised on our first attempt to access it.
The idea of the double underscore in Python is completely different. It was created as a means to override different methods of a class that is going to be extended several times, without the risk of having collisions with the method names. Even that is a too far-fetched use case as to justify the use of this mechanism.
Double underscores are a non-Pythonic approach. If you need to define attributes as private
, use a single underscore, and respect the Pythonic convention that it is a private
attribute.
Do not define attributes with leading double underscores.
By the same token, do not define your own "dunder" methods (methods whose names are surrounded by double underscores)
Let's now explore the opposite case, that is, when we do want to access some attributes of an object that are intended to be public
. Typically, we'd use properties
for this, which we will explore in the next section.
Properties
Typically, in object-oriented design, we create objects to represent an abstraction over an entity of the domain problem. In this sense, objects can encapsulate behavior or data. And more often than not, the accuracy of the data determines if an object can be created or not. That is to say, some entities can only exist for certain values of the data, whereas incorrect values shouldn't be allowed.
This is why we create validation methods, typically to be used in the setter
operations. However, in Python, sometimes we can encapsulate these setter
and getter
methods more compactly by using properties
.
Consider the example of a geographical system that needs to deal with coordinates. There is only a certain range of values for which latitude and longitude make sense. Outside of those values, a coordinate cannot exist. We can create an object to represent a coordinate, but in doing so we must ensure that the values for latitude are at all times within the acceptable ranges. And for this we can use properties
:
class Coordinate :
def __init__ ( self, lat: float , long: float ) -> None :
self._latitude = self._longitude = None
self.latitude = lat
self.longitude = long
@property
def latitude ( self ) -> float:
return self._latitude
@latitude.setter
def latitude ( self, lat_value: float ) -> None :
if lat_value not in range (-90 , 90 + 1 ):
raise ValueError(f"{lat_value} is an invalid value for latitude")
self._latitude = lat_value
@property
def longitude ( self ) -> float:
return self._longitude
@longitude.setter
def longitude ( self, long_value: float ) -> None :
if long_value not in range (-180 , 180 + 1 ):
raise ValueError(f"{long_value} is an invalid value for longitude")
self._longitude = long_value
Here, we're using a property to define the latitude and longitude. In doing so, we establish that retrieving any of these attributes will return the internal value held in the private
variables. More importantly, when any user wants to modify values for any of these properties
in the following form:
coordinate.latitude = <new-latitude-value>
The validation method that's declared with the @latitude.setter
decorator will be automatically (and transparently) invoked, and it will pass the value on the right-hand-side of the statement (<new-latitude-value>
) as the parameter (named lat_value
in the preceding code).
Don't write custom get_*
and set_*
methods for all attributes on your objects. Most of the time, leaving them as regular attributes is just enough. If you need to modify the logic for when an attribute is retrieved or modified, then use properties
.
We have seen the case for when an object needs to hold values, and how properties
help us to manage their internal data in a consistent and transparent way, but sometimes, we might also need to do some computations based on the state of the object and its internal data. Most of the time, properties are a good choice for this.
For example, if you have an object that needs to return a value in a particular format, or data type, a property can be used to do this computation. In the previous example, if we decided that we wanted to return the coordinates with a precision of up to four decimal places (regardless of how many decimal places the original number was provided with), we can make the computation for rounding this in the @property
method that reads the value.
You might find that properties are a good way to achieve command and query separation (CC08
). The command and query separation principle states that a method of an object should either answer to something or do something, but not both. If a method is doing something, and at the same time it returns a status answering a question of how that operation went, then it's doing more than one thing, clearly violating the principle that says that functions should do one thing, and one thing only.
Depending on the name of the method, this can create even more confusion, making it harder for readers to understand what the actual intention of the code is. For example, if a method is called set_email
, and we use it as if self.set_email("a@j.com"): ...
, what is that code doing? Is it setting the email to a@j.com?
Is it checking if the email is already set to that value? Both (setting and then checking if the status is correct)?
With properties
, we can avoid this kind of confusion. The @property
decorator is the query that will answer to something, and @<property_name>.setter
is the command that will do something.
Another piece of good advice derived from this example is as follows—don't do more than one thing in a method. If you want to assign something and then check the value, break that down into two or more statements.
To illustrate what this means, using the previous example, we would have one setter
or getter method, to set the email of the user, and then another property to simply ask for the email. This is because, in general, any time we ask an object about its current state, it should return it without side effects (without changing its internal representation). Perhaps the only exception I can think of to this rule would be in the case of a lazy property: something we want to precompute only once, and then use the computed value. For the rest of the cases, try to make properties idempotent, and then methods that are allowed to change the internal representation of the object, but don't mix both.
Methods should do one thing only. If you have to run an action and then check for the status, do that in separate methods that are called by different statements.
Creating classes with a more compact syntax
Continuing with the idea that sometimes, we need objects to hold values, there's a common boilerplate in Python when it comes to the initialization of objects, which is to declare in the __init__
method all attributes that the object will have, and then set that to internal variables, typically in the following form:
def __init__ ( self, x, y, … ):
self.x = x
self.y = y
Since Python 3.7, we can simplify this by using the dataclasses
module. This has been introduced by PEP-557. We have seen this module in the previous chapter, in the context of using annotations on the code, and here we'll review it briefly in terms of how it helps us write more compact code.
This module provides a @dataclass
decorator, which, when applied to a class, it'll take all the class attributes with annotations, and treat them as instance attributes, as if they were declared in the initialization method. When using this decorator, it will automatically generate the __init__
method on the class, so we don't have to.
Additionally, this module provides a field
object that will help us define particular traits for some of the attributes. For example, if one of the attributes we need needs to be mutable (such as a list
), we'll see later in the chapter (in the section for avoiding caveats in Python) that we cannot pass this default empty list in the __init__
method, and that instead we should pass None
, and set it to a default list inside __init__
, if None
was provided.
When using the field
object, what we would do instead is to use the default_factory
argument, and provide the list
class to it. This argument is meant to be used with a callable that takes no arguments, and will be called to construct the object, when nothing is provided for the value of that attribute.
Because there's no __init__
method to be implemented, what happens if we need to run validations? Or if we want to have some attributes computed or derived from previous ones? To answer the latter, we can rely on properties
, as we have just explored in the previous section. As per the former, the data classes allow us to have a __post_init__
method that will be called automatically by __init__
, so this would be a good place to write our logic for post-initialization.
To put all of this into practice, let's consider the example of modeling a node for an R-Trie data structure (where R stands for radix , which means it is an indexed tree over some base R). The details of this data structure, and the algorithms associated with it, are beyond the scope of this book, but for the purposes of the example, I'll mention that is a data structure designed to answer queries over text or strings (such as prefixes, and finding similar or related words). In a very basic form, this data structure contains a value (that holds a character, and it can be its integer representation, for instance), and then an array or length R with references to the next nodes (it's a recursive data structure, in the same sense as a linked list
or a tree
for example). The idea is that each position of the array defines implicitly a reference to the next node. For example, imagine the value 0
is mapped to the character 'a'
, then if the next node contains a value different than None
in its 0
position, then this means there's a reference for 'a'
, and that points to another R-Trie node.
Graphically, the data structure might look something like this:
Figure 2.1: Generic structure for an R-Trie node
And we could write a code block like the following one to represent it. In the following code, the attribute named next_
contains a trailing underscore, just as a way to differentiate it from the built-in next
function. We can argue that in this case, there's no collision, but if we needed to use the next()
function within the RTrieNode
class, that could be problematic (and those are usually hard-to-catch subtle errors):
from typing import List
from dataclasses import dataclass, field
R = 26
@dataclass
class RTrieNode :
size = R
value: int
next_: List["RTrieNode"] = field(
default_factory=lambda : [None ] * R)
def __post_init__ ( self ):
if len (self.next_) != self.size:
raise ValueError(f"Invalid length provided for next list ")
The preceding example contains several different combinations. First, we define an R-Trie with R=26
to represent the characters in the English alphabet (this is not important to understand the code itself, but it gives more context). The idea is that if we want to store a word, we create a node for each letter, starting with the first one. When there's a link to the next character, we store it in the position of the next_
array corresponding to that character, another node for that one, and so on.
Note the first attribute in the class: size
. This one doesn't have an annotation, so it's a regular class attribute (shared for all node objects), and not something that belongs exclusively to the object. Alternatively, we could have defined this by setting field(init=False)
, but this form is more compact. However, if we wanted to annotate the variable, but not consider it as part of __init__
, then this syntax is the only viable alternative.
Then follow two other attributes, both of which have annotations, but with different considerations. The first one, value
, is an integer, but it doesn't have a default argument, so when we create a new node, we must always provide a value as a first parameter. The second one is a mutable argument (a list
of itself), and it does have a default factory: in this case a lambda
function that will create a new list of size R, initialized with None
on all slots. Note that if we had used field(default_factory=list)
for this, we would still have constructed a new list for each object on creation, but this loses control over the length of that list. And finally, we wanted to validate that we don't create nodes that have a list of next nodes with the wrong length, so this is validated in the __post_init__
method. Any attempt to create such a list will be prevented with a ValueError
at initialization time.
Data classes provide a more compact way of writing classes, without the boilerplate of having to set all variables with the same name in the __init__
method.
When you have objects that don't do many complex validations or transformations on the data, consider this alternative. Keep in mind this last point. Annotations are great, but they don't enforce data conversion. This means that for example, if you declare an attribute that needs to be a float
, or an integer
, then you must do this conversion in the __init__
method. Writing this as a data class won't do it, and it might hide subtle errors. This is for cases when validations aren't strictly required and type casts are possible. For example, it's perfectly fine to define an object that can be created from multiple other types, like converting a float
from a numeric string
(after all, this leverages Python's dynamic typing nature), provided this is correctly converted to the required data type within the __init__
method.
Probably a good use case for data classes would be all those places when we need to use objects as data containers or wrappers, namely situations on which we used named tuples or simple namespaces. Consider data classes as another alternative to named tuples or namespaces when you're evaluating options in your code.
Iterable objects
In Python, we have objects that can be iterated by default. For example, lists, tuples, sets, and dictionaries can not only hold data in the structure we want, but also be iterated over a for
loop to get those values repeatedly.
However, the built-in iterable
objects are not the only kind that we can have in a for
loop. We could also create our own iterable
, with the logic we define for iteration.
In order to achieve this, we rely, once again, on magic methods.
Iteration works in Python by its own protocol (namely the iterator
protocol). When you try to iterate an object in the form for e in myobject:...
, what Python checks at a very high level are the following two things, in order:
If the object contains one of the iterator methods— __next__
or __iter__
If the object is a sequence and has __len__
and __getitem__
Therefore, as a fallback mechanism, sequences can be iterated, and so there are two ways of customizing our objects to be able to work on for
loops.
Creating iterable objects
When we try to iterate an object, Python will call the iter()
function over it. One of the first things this function checks for is the presence of the __iter__
method on that object, which, if present, will be executed.
The following code creates an object that allows iterating over a range of dates, producing one day at a time on every round of the loop:
from datetime import timedelta
class DateRangeIterable :
"""An iterable that contains its own iterator object ."""
def __init__ ( self, start_date, end_date ):
self.start_date = start_date
self.end_date = end_date
self._present_day = start_date
def __iter__ ( self ):
return self
def __next__ ( self ):
if self._present_day >= self.end_date:
raise StopIteration()
today = self._present_day
self._present_day += timedelta(days=1 )
return today
This object is designed to be created with a pair of dates, and when iterated, it will produce each day in the interval of specified dates, which is shown in the following code:
> >> from datetime import date
> >> for day in DateRangeIterable(date(2018, 1, 1), date(2018, 1, 5)):
... print(day)
...
2018-01-01
2018-01-02
2018-01-03
2018-01-04
> >>
Here, the for
loop is starting a new iteration over our object. At this point, Python will call the iter()
function on it, which, in turn, will call the __iter__
magic method. On this method, it is defined to return self
, indicating that the object is an iterable
itself, so at that point every step of the loop will call the next()
function on that object, which delegates to the __next__
method. In this method, we decide how to produce the elements and return one at a time. When there is nothing else to produce, we have to signal this to Python by raising the StopIteration
exception.
This means that what is actually happening is similar to Python calling next()
every time on our object until there is a StopIteration
exception, on which it knows it has to stop the for
loop:
> >> r = DateRangeIterable(date(2018, 1, 1), date(2018, 1, 5))
> >> next(r)
datetime.date(2018, 1, 1)
> >> next(r)
datetime.date(2018, 1, 2)
> >> next(r)
datetime.date(2018, 1, 3)
> >> next(r)
datetime.date(2018, 1, 4)
> >> next(r)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ... __next__
raise StopIteration
StopIteration
> >>
This example works, but it has a small problem—once exhausted, the iterable
will continue to be empty, hence raising StopIteration
. This means that if we use this on two or more consecutive for
loops, only the first one will work, while the second one will be empty:
> >> r1 = DateRangeIterable(date(2018, 1, 1), date(2018, 1, 5))
> >> ", ".join(map(str, r1))
'2018-01-01, 2018-01-02, 2018-01-03, 2018-01-04'
> >> max(r1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence
> >>
This is because of the way the iteration protocol works—an iterable
constructs an iterator, and this one is the one being iterated over. In our example, __iter__
just returned self
, but we can make it create a new iterator every time it is called. One way of fixing this would be to create new instances of DateRangeIterable
, which is not a terrible issue, but we can make __iter__
use a generator (which are iterator objects), which is being created every time:
class DateRangeContainerIterable :
def __init__ ( self, start_date, end_date ):
self.start_date = start_date
self.end_date = end_date
def __iter__ ( self ):
current_day = self.start_date
while current_day < self.end_date:
yield current_day
current_day += timedelta(days=1 )
And this time it works:
> >> r1 = DateRangeContainerIterable(date(2018, 1, 1), date(2018, 1, 5))
> >> ", ".join(map(str, r1))
'2018-01-01, 2018-01-02, 2018-01-03, 2018-01-04'
> >> max(r1)
datetime.date(2018, 1, 4)
> >>
The difference is that each for
loop is calling __iter__
again, and each one of those is creating the generator again.
This is called a container iterable
.
In general, it is a good idea to work with container iterables when dealing with generators.
Details on generators will be explained in more detail in Chapter 7 , Generators, Iterators, and Asynchronous Programming .
Creating sequences
Maybe our object does not define the __iter__()
method, but we still want to be able to iterate over it. If __iter__
is not defined on the object, the iter()
function will look for the presence of __getitem__
, and if this is not found, it will raise TypeError
.
A sequence is an object that implements __len__
and __getitem__
and expects to be able to get the elements it contains, one at a time, in order, starting at zero as the first index. This means that you should be careful in the logic so that you correctly implement __getitem__
to expect this type of index, or the iteration will not work.
The example from the previous section had the advantage that it uses less memory. This means that it is only holding one date at a time and knows how to produce the days one by one. However, it has the drawback that if we want to get the nth element, we have no way to do so but iterate n-times until we reach it. This is a typical trade-off in computer science between memory and CPU usage.
The implementation with an iterable
will use less memory, but it takes up to O(n) to get an element, whereas implementing a sequence will use more memory (because we have to hold everything at once), but supports indexing in constant time, O(1) .
The preceding notation (for example, O(n) ) is called asymptotic notation (or "big-O" notation) and it describes the order of complexity of the algorithm. At a very high level, this means how many operations the algorithm needs to perform as a function of the size of the input (n) . For more information on this, you can check out (ALGO01) listed at the end of the chapter, which contains a detailed study of asymptotic notation.
This is what the new implementation might look like:
class DateRangeSequence :
def __init__ ( self, start_date, end_date ):
self.start_date = start_date
self.end_date = end_date
self._range = self._create_range()
def _create_range ( self ):
days = []
current_day = self.start_date
while current_day < self.end_date:
days.append(current_day)
current_day += timedelta(days=1 )
return days
def __getitem__ ( self, day_no ):
return self._range[day_no]
def __len__ ( self ):
return len (self._range)
Here is how the object behaves:
> >> s1 = DateRangeSequence(date(2018, 1, 1), date(2018, 1, 5))
> >> for day in s1:
... print(day)
...
2018-01-01
2018-01-02
2018-01-03
2018-01-04
> >> s1[0]
datetime.date(2018, 1, 1)
> >> s1[3]
datetime.date(2018, 1, 4)
> >> s1[-1]
datetime.date(2018, 1, 4)
In the preceding code, we can see that negative indices also work. This is because the DateRangeSequence
object delegates all of the operations to its wrapped object (a list
), which is the best way to maintain compatibility and a consistent behavior.
Evaluate the trade-off between memory and CPU usage when deciding which one of the two possible implementations to use. In general, the iteration is preferable (and generators even more), but keep in mind the requirements of every case.
Container objects
Containers are objects that implement a __contains__
method (that usually returns a Boolean
value). This method is called in the presence of the in
keyword of Python.
Something like the following:
element in container
When used in Python, becomes this:
container.__contains__(element)
You can imagine how much more readable (and Pythonic!) the code can be when this method is properly implemented.
Let's say we have to mark some points on a map of a game that has two-dimensional coordinates. We might expect to find a function like the following:
def mark_coordinate ( grid, coord ):
if 0 <= coord.x < grid.width and 0 <= coord.y < grid.height:
grid[coord] = MARKED
Now, the part that checks the condition of the first if
statement seems convoluted; it doesn't reveal the intention of the code, it's not expressive, and worst of all it calls for code duplication (every part of the code where we need to check the boundaries before proceeding will have to repeat that if
statement).
What if the map itself (called grid
on the code) could answer this question? Even better, what if the map could delegate this action to an even smaller (and hence more cohesive) object?
We could solve this problem in a more elegant way with object-oriented design and with the help of a magic method. In this case, we can create a new abstraction to represent the limits of the grid, which can be made an object in itself. Figure 2.2 helps illustrate the point:
Figure 2.2: An example using composition, distributing responsibilities in different classes, and using the container magic method
Parenthetically, I'll mention that it's true that in general, class names refer to nouns, and they're usually singular. So, it might sound strange to have a class named Boundaries
, but if we think about it, perhaps for this particular case, it makes sense to say that we have an object representing all the boundaries of a grid, especially because of the way it's being used (in this case, we're using it to validate if a particular coordinate is within those boundaries).
With this design, we can ask the map
if it contains a coordinate, and the map
itself can have information about its limit and pass the query down to its internal collaborator:
class Boundaries :
def __init__ ( self, width, height ):
self.width = width
self.height = height
def __contains__ ( self, coord ):
x, y = coord
return 0 <= x < self.width and 0 <= y < self.height
class Grid :
def __init__ ( self, width, height ):
self.width = width
self.height = height
self.limits = Boundaries(width, height)
def __contains__ ( self, coord ):
return coord in self.limits
This code alone is a much better implementation. First, it is doing a simple composition and it's using delegation to solve the problem. Both objects are really cohesive, having the minimal possible logic; the methods are short, and the logic speaks for itself—coord in self.limits
is pretty much a declaration of the problem to solve, expressing the intention of the code.
From the outside, we can also see the benefits. It's almost as if Python is solving the problem for us:
def mark_coordinate ( grid, coord ):
if coord in grid:
grid[coord] = MARKED
Dynamic attributes for objects
It is possible to control the way attributes are obtained from objects by means of the __getattr__
magic method. When we call something like <myobject>.<myattribute>
, Python will look for <myattribute>
in the dictionary of the object, calling __getattribute__
on it. If this is not found (namely, the object does not have the attribute we are looking for), then the extra method, __getattr__
, is called, passing the name of the attribute (myattribute) as a parameter.
By receiving this value, we can control the way things should be returned to our objects. We can even create new attributes, and so on.
In the following listing, the __getattr__
method is demonstrated:
class DynamicAttributes :
def __init__ ( self, attribute ):
self.attribute = attribute
def __getattr__ ( self, attr ):
if attr.startswith("fallback_"):
name = attr.replace("fallback_", "")
return f"[fallback resolved] {name}"
raise AttributeError(
f"{self.__class__.__name__} has no attribute {attr}"
)
Here are some calls to an object of this class:
> >> dyn = DynamicAttributes("value")
> >> dyn.attribute
'value'
> >> dyn.fallback_test
'[fallback resolved] test'
> >> dyn.__dict__["fallback_new"] = "new value"
> >> dyn.fallback_new
'new value'
> >> getattr(dyn, "something", "default")
'default'
The first call is straightforward—we just request an attribute that the object has and get its value as a result. The second is where this method takes action because the object does not have anything called fallback_test
, so __getattr__
will run with that value. Inside that method, we placed the code that returns a string, and what we get is the result of that transformation.
The third example is interesting because a new attribute named fallback_new
is created (actually, this call would be the same as running dyn.
fallback_new = "new value"
)
, so when we request that attribute, notice that the logic we put in __getattr__
does not apply, simply because that code is never called.
Now, the last example is the most interesting one. There is a subtle detail here that makes a huge difference. Take another look at the code in the __getattr__
method. Notice the exception it raises when the value is not retrievable, AttributeError
. This is not only for consistency (as well as the message in the exception), but also required by the built-in getattr()
function. Had this exception been any other, it would raise, and the default value would not be returned.
Be careful when implementing a method so dynamic as __getattr__
, and use it with caution. When implementing __getattr__
, raise AttributeError
.
The __getattr__
magic method is useful in many situations. It can be used to create a proxy to another object. For example, if you're creating a wrapper object on top of another one by means of composition, and you want to delegate most of the methods to the wrapped object, instead of copying and defining all of those methods, you can implement __getattr__
that will internally call the same method on the wrapped object.
Another example is when you know you need attributes that are dynamically computed. I've used it on a past project working with GraphQL
(https://graphql.org/ ) with Graphene
(https://graphene-python.org/ ). The way the library worked was by using resolver methods. Basically, every method named resolve_X
was used when property X
was requested. Since there were already domain objects that could resolve each property X
in the class of the Graphene
object, __getattr__
was implemented to know where to get each property from, without having to write a massive boilerplate code.
Use the __getattr__
magic method when you see an opportunity to avoid lots of duplicated code and boilerplate, but don't abuse this method, as it'll render the code harder to understand and reason about. Keep in mind that having attributes that aren't explicitly declared and just appear dynamically will make the code harder to understand. When using this method, you're always weighing code compactness versus maintainability.
Callable objects
It is possible (and often convenient) to define objects that can act as functions. One of the most common applications for this is to create better decorators, but it's not limited to that.
The magic method __call__
will be called when we try to execute our object as if it were a regular function. Every argument passed to it will be passed along to the __call__
method.
The main advantage of implementing functions this way, through objects, is that objects have states, so we can save and maintain information across calls. This means that using a callable
object might be a more convenient way of implementing functions if we need to maintain an internal state across different calls. Examples of this can be functions we would like to implement with memoization, or internal caches.
When we have an object, a statement like this, object(*args, **kwargs)
, is translated in Python to object.__call__(*args, **kwargs)
.
This method is useful when we want to create callable objects that will work as parametrized functions, or in some cases, functions with memory.
The following listing uses this method to construct an object that, when called with a parameter, returns the number of times it has been called with the very same value:
from collections import defaultdict
class CallCount :
def __init__ ( self ):
self._counts = defaultdict(int )
def __call__ ( self, argument ):
self._counts[argument] += 1
return self._counts[argument]
Some examples of this class in action are as follows:
> >> cc = CallCount()
> >> cc(1)
1
> >> cc(2)
1
> >> cc(1)
2
> >> cc(1)
3
> >> cc("something")
1
>>> callable(cc)
True
Later in this book, we will find out that this method comes in handy when creating decorators.
Summary of magic methods
We can summarize the concepts we described in the previous sections in the form of a cheat sheet like the one presented as follows. For each action in Python, the magic method involved is presented, along with the concept that it represents:
Statement
Magic method
Behavior
obj[key]
obj[i:j]
obj[i:j:k]
__getitem__(key)
Subscriptable object
with obj: ...
__enter__ / __exit__
Context manager
for i in obj: ...
__iter__ / __next__
__len__ / __getitem__
Iterable object
Sequence
obj.<attribute>
__getattr__
Dynamic attribute retrieval
obj(*args, **kwargs)
__call__(*args, **kwargs)
Callable object
Table 2.1: Magic methods and their behavior in Python
The best way to implement these methods correctly (and to know the set of methods that need to be implemented together) is to declare our class to implement the corresponding class following the abstract base classes defined in the collections.abc
module (https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes ). These interfaces provide the methods that need to be implemented, so it'll make it easier for you to define the class correctly, and it'll also take care of creating the type correctly (something that works well when the isinstance()
function is called on your object).
We have seen the main features of Python with respect to its peculiar syntax. With the features we have learned (context managers, callable objects, creating our own sequences, and suchlike), we are now able to write code that will blend well with Python's reserved words (for example, we can use the with
statements with our own context managers, or the in
operator with a container of our own.)
With practice and experience, you'll become more fluent with these features of Python, until it becomes second nature for you to wrap the logic you're writing behind abstractions with nice and small interfaces. Give it enough time, and the reverse effect will take place: Python will start programming you. That is, you'll naturally think of having small, clean interfaces in your programs, so even when you're creating software in a different language, you'll try to use these concepts. For example, if you find yourself programming in, let's say, Java or C (or even Bash), you might identify a scenario where a context manager might be useful. Now the language itself might not support this out of the box, but that might not stop you from writing your own abstraction that provides similar guarantees. And that's a good thing. It means you have internalized good concepts beyond a specific language, and you can apply them in different situations.
All programming languages have their caveats, and Python is no exception, so in order to have a more complete understanding of Python, we'll review some of them in the next section.