Indexes and slices
In Python, as in other languages, some data structures or types support accessing its elements by index. Another thing it has in common with most programming languages is that the first element is placed in the index number 0
. However, unlike those languages, when we want to access the elements in a different order than usual, Python provides extra features.
For example, how would you access the last element of an array in C? This is something I did the first time I tried Python. Thinking the same way as in C, I would get the element in the position of the length of the array minus one. In Python, this would work too, but we could also use a negative index number, which will start counting from the last element, as shown in the following commands:
>>> my_numbers = (4, 5, 3, 9)
>>> my_numbers[-1]
9
>>> my_numbers[-3]
5
This is an example of the preferred (Pythonic) way of doing things.
In addition to getting just one element, we can obtain many by using slice
, as shown in the following commands:
>>> my_numbers = (1, 1, 2, 3, 5, 8, 13, 21)
>>> my_numbers[2:5]
(2, 3, 5)
In this case, the syntax on the square brackets means that we get all of the elements on the tuple, starting from the index of the first number (inclusive), up to the index on the second one (not including it). Slices work this way in Python by excluding the end of the selected interval.
You can exclude either one of the intervals, start or stop, and in that case, it will act from the beginning or end of the sequence, respectively, as shown in the following commands:
>>> my_numbers[:3]
(1, 1, 2)
>>> my_numbers[3:]
(3, 5, 8, 13, 21)
>>> my_numbers[::] # also my_numbers[:], returns a copy
(1, 1, 2, 3, 5, 8, 13, 21)
>>> my_numbers[1:7:2]
(1, 3, 8)
In the first example, it will get everything up to the index in the position number 3
. In the second example, it will get all the numbers from the position 3
(inclusive), up to the end. In the second to last example, where both ends are excluded, it is actually creating a copy of the original tuple.
The last example includes a third parameter, which is the step. This indicates how many elements to jump when iterating over the interval. In this case, it would mean getting the elements between the positions one and seven, jumping by two.
In all of these cases, when we pass intervals to a sequence, what is actually happening is that we are passing slice
. Note that slice
is a built-in object in Python that you can build yourself and pass directly:
>>> interval = slice(1, 7, 2)
>>> my_numbers[interval]
(1, 3, 8)
>>> interval = slice(None, 3)
>>> my_numbers[interval] == my_numbers[:3]
True
Notice that when one of the elements is missing (start, stop, or step), it is considered to be None
.
You should always prefer to use this built-in syntax for slices, as opposed to manually trying to iterate the tuple, string, or list inside a for
loop, excluding the elements by hand.
Creating your own sequences
The functionality we just discussed works, thanks to a magic method (magic methods are those surrounded by double underscores that Python uses to reserve special behavior) called __getitem__
. This is the method that is called when something like myobject[key]
is called, passing the key
(value inside the square brackets) as a parameter. A sequence, in particular, is an object that implements both __getitem__
and __len__
, and for this reason, it can be iterated over. Lists, tuples, and strings are examples of sequence objects in the standard library.
In this section, we care more about getting particular elements from an object by a key than building sequences or iterable objects, which is a topic explored in Chapter 7, Generators, Iterators, and Asynchronous Programming.
If you are going to implement __getitem__
in a custom class in your domain, you will have to take into account some considerations in order to follow a Pythonic approach.
In the case that your class is a wrapper around a standard library object, you might as well delegate the behavior as much as possible to the underlying object. This means that if your class is actually a wrapper on the list, call all of the same methods on that list to make sure that it remains compatible. In the following listing, we can see an example of how an object wraps a list, and for the methods we are interested in, we just delegate to its corresponding version on the list
object:
from collections.abc import Sequence
class Items(Sequence):
def __init__(self, *values):
self._values = list(values)
def __len__(self):
return len(self._values)
def __getitem__(self, item):
return self._values.__getitem__(item)
To declare that our class is a sequence, it implements the Sequence
interface from the collections.abc
module (https://docs.python.org/3/library/collections.abc.html). For the classes you write that are intended to behave as standard types of objects (containers, mappings, and so on), it's a good idea to implement the interfaces from this module, because that reveals the intention of what that class is meant to be, and also because using the interfaces will force you to implement the required methods.
This example uses composition (because it contains an internal collaborator that is a list, rather than inheriting from the list class). Another way of doing it is through class inheritance, in which case we will have to extend the collections.UserList
base class, with the considerations and caveats mentioned in the last part of this chapter.
If, however, you are implementing your own sequence that is not a wrapper or does not rely on any built-in object underneath, then keep in mind the following points:
- When indexing by a range, the result should be an instance of the same type of the class
- In the range provided by
slice
, respect the semantics that Python uses, excluding the element at the end
The first point is a subtle error. Think about it—when you get a slice of a list, the result is a list; when you ask for a range in a tuple, the result is a tuple; and when you ask for a substring, the result is a string. It makes sense in each case that the result is of the same type as the original object. If you are creating, let's say, an object that represents an interval of dates, and you ask for a range on that interval, it would be a mistake to return a list or tuple, or something else. Instead, it should return a new instance of the same class with the new interval set. The best example of this is in the standard library, with the range
function. If you call range
with an interval, it will construct an iterable object that knows how to produce the values in the selected range. When you specify an interval for range
, you get a new range (which makes sense), not a list:
>>> range(1, 100)[25:50]
range(26, 51)
The second rule is also about consistency—users of your code will find it more familiar and easier to use if it is consistent with Python itself. As Python developers, we are already used to the idea of how the slices work, how the range
function works, and so on. Making an exception on a custom class will create confusion, which means that it will be harder to remember, and it might lead to bugs.
Now that we know about indices and slices, and how to create our own, in the next section, we'll take the same approach but for context managers. First, we'll see how context managers from the standard library work, and then we'll go to the next level and create our own.