Creating Python classes
We don't have to write much Python code to realize that Python is a very clean language. When we want to do something, we can just do it, without having to set up a bunch of prerequisite code. The ubiquitous hello world in Python, as you've likely seen, is only one line.
Similarly, the simplest class in Python 3 looks like this:
class MyFirstClass:
pass
There's our first object-oriented program! The class definition starts with the class
keyword. This is followed by a name (of our choice) identifying the class and is terminated with a colon.
The class name must follow standard Python variable naming rules (it must start with a letter or underscore, and can only be comprised of letters, underscores, or numbers). In addition, the Python style guide (search the web for PEP 8) recommends that classes should be named using what PEP 8 calls CapWords notation (start with a capital letter; any subsequent words should also start with a capital).
The class definition line is followed by the class contents, indented. As with other Python constructs, indentation is used to delimit the classes, rather than braces, keywords, or brackets, as many other languages use. Also, in line with the style guide, use four spaces for indentation unless you have a compelling reason not to (such as fitting in with somebody else's code that uses tabs for indents).
Since our first class doesn't actually add any data or behaviors, we simply use the pass
keyword on the second line as a placeholder to indicate that no further action needs to be taken.
We might think there isn't much we can do with this most basic class, but it does allow us to instantiate objects of that class. We can load the class into the Python 3 interpreter, so we can interactively play with it. To do this, save the class definition mentioned earlier in a file named first_class.py
and then run the python -i first_class.py
command. The -i
argument tells Python to run the code and then drop to the interactive interpreter. The following interpreter session demonstrates a basic interaction with this class:
>>> a = MyFirstClass()
>>> b = MyFirstClass()
>>> print(a)
<__main__.MyFirstClass object at 0xb7b7faec>
>>> print(b)
<__main__.MyFirstClass object at 0xb7b7fbac>
This code instantiates two objects from the new class, assigning the object variable names a
and b
. Creating an instance of a class is a matter of typing the class name, followed by a pair of parentheses. It looks much like a function call; calling a class will create a new object. When printed, the two objects tell us which class they are and what memory address they live at. Memory addresses aren't used much in Python code, but here, they demonstrate that there are two distinct objects involved.
We can see they're distinct objects by using the is
operator:
>>> a is b
False
This can help reduce confusion when we've created a bunch of objects and assigned different variable names to the objects.
Adding attributes
Now, we have a basic class, but it's fairly useless. It doesn't contain any data, and it doesn't do anything. What do we have to do to assign an attribute to a given object?
In fact, we don't have to do anything special in the class definition to be able to add attributes. We can set arbitrary attributes on an instantiated object using dot notation. Here's an example:
class Point:
pass
p1 = Point()
p2 = Point()
p1.x = 5
p1.y = 4
p2.x = 3
p2.y = 6
print(p1.x, p1.y)
print(p2.x, p2.y)
If we run this code, the two print
statements at the end tell us the new attribute values on the two objects:
5 4
3 6
This code creates an empty Point
class with no data or behaviors. Then, it creates two instances of that class and assigns each of those instances x
and y
coordinates to identify a point in two dimensions. All we need to do to assign a value to an attribute on an object is use the <object>.<attribute> = <value>
syntax. This is sometimes referred to as dot notation. The value can be anything: a Python primitive, a built-in data type, or another object. It can even be a function or another class!
Creating attributes like this is confusing to the mypy tool. There's no easy way to include the hints in the Point
class definition. We can include hints on the assignment statements, like this: p1.x: float = 5
. In general, there's a much, much better approach to type hints and attributes that we'll examine in the Initializing the object section, later in this chapter. First, though, we'll add behaviors to our class definition.
Making it do something
Now, having objects with attributes is great, but object-oriented programming is really about the interaction between objects. We're interested in invoking actions that cause things to happen to those attributes. We have data; now it's time to add behaviors to our classes.
Let's model a couple of actions on our Point
class. We can start with a method called reset
, which moves the point to the origin (the origin is the place where x
and y
are both zero). This is a good introductory action because it doesn't require any parameters:
class Point:
def reset(self):
self.x = 0
self.y = 0
p = Point()
p.reset()
print(p.x, p.y)
This print
statement shows us the two zeros on the attributes:
0 0
In Python, a method is formatted identically to a function. It starts with the def
keyword, followed by a space, and the name of the method. This is followed by a set of parentheses containing the parameter list (we'll discuss that self
parameter, sometimes called the instance variable, in just a moment), and terminated with a colon. The next line is indented to contain the statements inside the method. These statements can be arbitrary Python code operating on the object itself and any parameters passed in, as the method sees fit.
We've omitted type hints in the reset()
method because it's not the most widely used place for hints. We'll look at the best place for hints in the Initializing the object section. We'll look a little more at these instance variables, first, and how the self
variable works.
Talking to yourself
The one difference, syntactically, between methods of classes and functions outside classes is that methods have one required argument. This argument is conventionally named self
; I've never seen a Python programmer use any other name for this variable (convention is a very powerful thing). There's nothing technically stopping you, however, from calling it this
or even Martha
, but it's best to acknowledge the social pressure of the Python community codified in PEP 8 and stick with self
.
The self
argument to a method is a reference to the object that the method is being invoked on. The object is an instance of a class, and this is sometimes called the instance variable.
We can access attributes and methods of that object via this variable. This is exactly what we do inside the reset
method when we set the x
and y
attributes of the self
object.
Pay attention to the difference between a class and an object in this discussion. We can think of the method as a function attached to a class. The self
parameter refers to a specific instance of the class. When you call the method on two different objects, you are calling the same method twice, but passing two different objects as the self
parameter.
Notice that when we call the p.reset()
method, we do not explicitly pass the self
argument into it. Python automatically takes care of this part for us. It knows we're calling a method on the p
object, so it automatically passes that object, p
, to the method of the class, Point
.
For some, it can help to think of a method as a function that happens to be part of a class. Instead of calling the method on the object, we could invoke the function as defined in the class, explicitly passing our object as the self
argument:
>>> p = Point()
>>> Point.reset(p)
>>> print(p.x, p.y)
The output is the same as in the previous example because, internally, the exact same process has occurred. This is not really a good programming practice, but it can help to cement your understanding of the self
argument.
What happens if we forget to include the self
argument in our class definition? Python will bail with an error message, as follows:
>>> class Point:
... def reset():
... pass
...
>>> p = Point()
>>> p.reset()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: reset() takes 0 positional arguments but 1 was given
The error message is not as clear as it could be ("Hey, silly, you forgot to define the method with a self
parameter" could be more informative). Just remember that when you see an error message that indicates missing arguments, the first thing to check is whether you forgot the self
parameter in the method definition.
More arguments
How do we pass multiple arguments to a method? Let's add a new method that allows us to move a point to an arbitrary position, not just to the origin. We can also include a method that accepts another Point
object as input and returns the distance between them:
import math
class Point:
def move(self, x: float, y: float) -> None:
self.x = x
self.y = y
def reset(self) -> None:
self.move(0, 0)
def calculate_distance(self, other: "Point") -> float:
return math.hypot(self.x - other.x, self.y - other.y)
We've defined a class with two attributes, x
, and y
, and three separate methods, move()
, reset()
, and calculate_distance()
.
The move()
method accepts two arguments, x
and y
, and sets the values on the self
object. The reset()
method calls the move()
method, since a reset is just a move to a specific known location.
The calculate_distance()
method computes the Euclidean distance between two points. (There are a number of other ways to look at distance. In the Chapter 3, When Objects Are Alike, case study, we'll look at some alternatives.) For now, we hope you understand the math. The definition is , which is the math.hypot()
function. In Python we'll use self.x
, but mathematicians often prefer to write .
Here's an example of using this class definition. This shows how to call a method with arguments: include the arguments inside the parentheses and use the same dot notation to access the method name within the instance. We just picked some random positions to test the methods. The test code calls each method and prints the results on the console:
>>> point1 = Point()
>>> point2 = Point()
>>> point1.reset()
>>> point2.move(5, 0)
>>> print(point2.calculate_distance(point1))
5.0
>>> assert point2.calculate_distance(point1) == point1.calculate_distance(
... point2
... )
>>> point1.move(3, 4)
>>> print(point1.calculate_distance(point2))
4.47213595499958
>>> print(point1.calculate_distance(point1))
0.0
The assert
statement is a marvelous test tool; the program will bail if the expression after assert
evaluates to False
(or zero, empty, or None
). In this case, we use it to ensure that the distance is the same regardless of which point called the other point's calculate_distance()
method. We'll see a lot more use of assert
in Chapter 13, Testing Object-Oriented Programs, where we'll write more rigorous tests.
Initializing the object
If we don't explicitly set the x
and y
positions on our Point
object, either using move
or by accessing them directly, we'll have a broken Point
object with no real position. What will happen when we try to access it?
Well, let's just try it and see. Try it and see is an extremely useful tool for Python study. Open up your interactive interpreter and type away. (Using the interactive prompt is, after all, one of the tools we used to write this book.)
The following interactive session shows what happens if we try to access a missing attribute. If you saved the previous example as a file or are using the examples distributed with the book, you can load it into the Python interpreter with the python -i more_arguments.py
command:
>>> point = Point()
>>> point.x = 5
>>> print(point.x)
5
>>> print(point.y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Point' object has no attribute 'y'
Well, at least it threw a useful exception. We'll cover exceptions in detail in Chapter 4, Expecting the Unexpected. You've probably seen them before (especially the ubiquitous SyntaxError
, which means you typed something incorrectly!). At this point, simply be aware that it means something went wrong.
The output is useful for debugging. In the interactive interpreter, it tells us the error occurred at line 1, which is only partially true (in an interactive session, only one statement is executed at a time). If we were running a script in a file, it would tell us the exact line number, making it easy to find the offending code. In addition, it tells us that the error is an AttributeError
, and gives a helpful message telling us what that error means.
We can catch and recover from this error, but in this case, it feels like we should have specified some sort of default value. Perhaps every new object should be reset()
by default, or maybe it would be nice if we could force the user to tell us what those positions should be when they create the object.
Interestingly, mypy can't determine whether y
is supposed to be an attribute of a Point
object. Attributes are – by definition – dynamic, so there's no simple list that's part of a class definition. However, Python has some widely followed conventions that can help name the expected set of attributes.
Most object-oriented programming languages have the concept of a constructor, a special method that creates and initializes the object when it is created. Python is a little different; it has a constructor and an initializer. The constructor method, __new__()
, is rarely used unless you're doing something very exotic. So, we'll start our discussion with the much more common initialization method, __init__()
.
The Python initialization method is the same as any other method, except it has a special name, __init__
. The leading and trailing double underscores mean this is a special method that the Python interpreter will treat as a special case.
Never name a method of your own with leading and trailing double underscores. It may mean nothing to Python today, but there's always the possibility that the designers of Python will add a function that has a special purpose with that name in the future. When they do, your code will break.
Let's add an initialization function on our Point
class that requires the user to supply x
and y
coordinates when the Point
object is instantiated:
class Point:
def __init__(self, x: float, y: float) -> None:
self.move(x, y)
def move(self, x: float, y: float) -> None:
self.x = x
self.y = y
def reset(self) -> None:
self.move(0, 0)
def calculate_distance(self, other: "Point") -> float:
return math.hypot(self.x - other.x, self.y - other.y)
Constructing a Point
instance now looks like this:
point = Point(3, 5)
print(point.x, point.y)
Now, our Point
object can never go without both x
and y
coordinates! If we try to construct a Point
instance without including the proper initialization parameters, it will fail with a not enough arguments
error similar to the one we received earlier when we forgot the self
argument in a method definition.
Most of the time, we put our initialization statements in an __init__()
function. It's very important to be sure that all of the attributes are initialized in the __init__()
method. Doing this helps the mypy tool by providing all of the attributes in one obvious place. It helps people reading your code, also; it saves them from having to read the whole application to find mysterious attributes set outside the class definition.
While they're optional, it's generally helpful to include type annotations on the method parameters and result values. After each parameter name, we've included the expected type of each value. At the end of the definition, we've included the two-character ->
operator and the type returned by the method.
Type hints and defaults
As we've noted a few times now, hints are optional. They don't do anything at runtime. There are tools, however, that can examine the hints to check for consistency. The mypy tool is widely used to check type hints.
If we don't want to make the two arguments required, we can use the same syntax Python functions use to provide default arguments. The keyword argument syntax appends an equals sign after each variable name. If the calling object does not provide this argument, then the default argument is used instead. The variables will still be available to the function, but they will have the values specified in the argument list. Here's an example:
class Point:
def __init__(self, x: float = 0, y: float = 0) -> None:
self.move(x, y)
The definitions for the individual parameters can get long, leading to very long lines of code. In some examples, you'll see this single logical line of code expanded to multiple physical lines. This relies on the way Python combines physical lines to match ()
's. We might write this when the line gets long:
class Point:
def __init__(
self,
x: float = 0,
y: float = 0
) -> None:
self.move(x, y)
This style isn't used very often, but it's valid and keeps the lines shorter and easier to read.
The type hints and defaults are handy, but there's even more we can do to provide a class that's easy to use and easy to extend when new requirements arise. We'll add documentation in the form of docstrings.
Explaining yourself with docstrings
Python can be an extremely easy-to-read programming language; some might say it is self-documenting. However, when carrying out object-oriented programming, it is important to write API documentation that clearly summarizes what each object and method does. Keeping documentation up to date is difficult; the best way to do it is to write it right into our code.
Python supports this through the use of docstrings. Each class, function, or method header can have a standard Python string as the first indented line inside the definition (the line that ends in a colon).
Docstrings are Python strings enclosed within apostrophes ('
) or quotation marks ("
). Often, docstrings are quite long and span multiple lines (the style guide suggests that the line length should not exceed 80 characters), which can be formatted as multi-line strings, enclosed in matching triple apostrophe ('''
) or triple quote ("""
) characters.
A docstring should clearly and concisely summarize the purpose of the class or method it is describing. It should explain any parameters whose usage is not immediately obvious, and is also a good place to include short examples of how to use the API. Any caveats or problems an unsuspecting user of the API should be aware of should also be noted.
One of the best things to include in a docstring is a concrete example. Tools like doctest can locate and confirm these examples are correct. All the examples in this book are checked with the doctest tool.
To illustrate the use of docstrings, we will end this section with our completely documented Point
class:
class Point:
"""
Represents a point in two-dimensional geometric coordinates
>>> p_0 = Point()
>>> p_1 = Point(3, 4)
>>> p_0.calculate_distance(p_1)
5.0
"""
def __init__(self, x: float = 0, y: float = 0) -> None:
"""
Initialize the position of a new point. The x and y
coordinates can be specified. If they are not, the
point defaults to the origin.
:param x: float x-coordinate
:param y: float x-coordinate
"""
self.move(x, y)
def move(self, x: float, y: float) -> None:
"""
Move the point to a new location in 2D space.
:param x: float x-coordinate
:param y: float x-coordinate
"""
self.x = x
self.y = y
def reset(self) -> None:
"""
Reset the point back to the geometric origin: 0, 0
"""
self.move(0, 0)
def calculate_distance(self, other: "Point") -> float:
"""
Calculate the Euclidean distance from this point
to a second point passed as a parameter.
:param other: Point instance
:return: float distance
"""
return math.hypot(self.x - other.x, self.y - other.y)
Try typing or loading (remember, it's python -i point.py
) this file into the interactive interpreter. Then, enter help(Point)<enter>
at the Python prompt.
You should see nicely formatted documentation for the class, as shown in the following output:
Help on class Point in module point_2:
class Point(builtins.object)
| Point(x: float = 0, y: float = 0) -> None
|
| Represents a point in two-dimensional geometric coordinates
|
| >>> p_0 = Point()
| >>> p_1 = Point(3, 4)
| >>> p_0.calculate_distance(p_1)
| 5.0
|
| Methods defined here:
|
| __init__(self, x: float = 0, y: float = 0) -> None
| Initialize the position of a new point. The x and y
| coordinates can be specified. If they are not, the
| point defaults to the origin.
|
| :param x: float x-coordinate
| :param y: float x-coordinate
|
| calculate_distance(self, other: 'Point') -> float
| Calculate the Euclidean distance from this point
| to a second point passed as a parameter.
|
| :param other: Point instance
| :return: float distance
|
| move(self, x: float, y: float) -> None
| Move the point to a new location in 2D space.
|
| :param x: float x-coordinate
| :param y: float x-coordinate
|
| reset(self) -> None
| Reset the point back to the geometric origin: 0, 0
|
| ----------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
Not only is our documentation every bit as polished as the documentation for built-in functions, but we can run python -m doctest point_2.py
to confirm the example shown in the docstring.
Further, we can run mypy to check the type hints, also. Use mypy –-strict src/*.py
to check all of the files in the src
folder. If there are no problems, the mypy application doesn't produce any output. (Remember, mypy is not part of the standard installation, so you'll need to add it. Check the preface for information on extra packages that need to be installed.)