Writing Python script and module files – syntax basics
We'll need to write Python script files in order to do anything that's fully automated. We can experiment with the language at the interactive >>>
prompt. We can also use JupyterLab interactively. For automated work, however, we'll need to create and run script files.
How can we make sure our code matches what's in common use? We need to look at some common aspects of style: how we organize our programming to make it readable.
We'll also look at a number of more technical considerations. For example, we need to be sure to save our files in UTF-8 encoding. While ASCII encoding is still supported by Python, it's a poor choice for modern programming. We'll also need to be sure to use spaces instead of tabs. If we use Unix newlines as much as possible, we'll also find it slightly simpler to create software that runs on a variety of operating systems.
Most text editing tools will work properly with Unix (newline) line endings as well as Windows or DOS (return-newline) line endings. Any tool that can't work with both kinds of line endings should be avoided.
Getting ready
To edit Python scripts, we'll need a good programming text editor. Python comes with a handy editor, IDLE. It works well for simple projects. It lets us jump back and forth between a file and an interactive >>>
prompt, but it's not a good programming editor for larger projects.
There are dozens of programming editors. It's nearly impossible to suggest just one. So we'll suggest a few.
The JetBrains PyCharm editor has numerous features. The community edition version is free. See https://www.jetbrains.com/pycharm/download/.
ActiveState has Komodo IDE, which is also very sophisticated. The Komodo Edit version is free and does some of the same things as the full Komodo IDE. See http://komodoide.com/komodo-edit/.
Notepad++ is good for Windows developers. See https://notepad-plus-plus.org.
BBEdit is very nice for macOS X developers. See http://www.barebones.com/products/bbedit/.
For Linux developers, there are several built-in editors, including VIM, gedit, and Kate. These are all good. Since Linux tends to be biased toward developers, the editors available are all suitable for writing Python.
What's important is that we'll often have two windows open while we're working:
- The script or file that we're working on in our editor of choice.
- Python's
>>>
prompt (perhaps from a shell or perhaps from IDLE) where we can try things out to see what works and what doesn't. We may be creating our script in Notepad++ but using IDLE to experiment with data structures and algorithms.
We actually have two recipes here. First, we need to set some defaults for our editor. Then, once the editor is set up properly, we can create a generic template for our script files.
How to do it...
First, we'll look at the general setup that we need to do in our editor of choice. We'll use Komodo examples, but the basic principles apply to all editors. Once we've set the edit preferences, we can create our script files:
- Open your editor of choice. Look at the preferences page for the editor.
- Find the settings for preferred file encoding. With Komodo Edit Preferences, it's on the Internationalization tab. Set this to UTF-8.
- Find the settings for indentation. If there's a way to use spaces instead of tabs, check this option. With Komodo Edit, we actually do this backward—we uncheck "prefer spaces over tabs." Also, set the spaces per indent to four. That's typical for Python code. It allows us to have several levels of indentation and still keep the code fairly narrow.
The rule is this: we want spaces; we do not want tabs.
Once we're sure that our files will be saved in UTF-8 encoding, and we're also sure we're using spaces instead of tabs, we can create an example script file:
- The first line of most Python script files should look like this:
#!/usr/bin/env python3
This sets an association between the file you're writing and Python.
For Windows, the filename-to-program association is done through a setting in one of the Windows control panels. Within the Default Programs control panel, there's a panel to Set Associations. This control panel shows that
.py
files are bound to the Python program. This is normally set by the installer, and we rarely need to change it or set it manually.Windows developers can include the preamble line anyway. It will make macOS X and Linux folks happy when they download the project from GitHub.
- After the preamble, there should be a triple-quoted block of text. This is the documentation string (called a docstring) for the file we're going to create. It's not technically mandatory, but it's essential for explaining what a file contains:
""" A summary of this script. """
Because Python triple-quoted strings can be indefinitely long, feel free to write as much as necessary. This should be the primary vehicle for describing the script or library module. This can even include examples of how it works.
- Now comes the interesting part of the script: the part that really does something. We can write all the statements we need to get the job done. For now, we'll use this as a placeholder:
print('hello world')
This isn't much, but at least the script does something. In other recipes, we'll look at more complex processing. It's common to create function and class definitions, as well as to write statements to use the functions and classes to do things.
For our first, simple script, all of the statements must begin at the left margin and must be complete on a single line. There are many Python statements that have blocks of statements nested inside them. These internal blocks of statements must be indented to clarify their scope. Generally—because we set indentation to four spaces—we can hit the Tab key to indent.
Our file should look like this:
#!/usr/bin/env python3
"""
My First Script: Calculate an important value.
"""
print(355/113)
How it works...
Unlike other languages, there's very little boilerplate in Python. There's only one line of overhead and even the #!/usr/bin/env python3
line is generally optional.
Why do we set the encoding to UTF-8? While the entire language is designed to work using just the original 128 ASCII characters, we often find that ASCII is limiting. It's easier to set our editor to use UTF-8 encoding. With this setting, we can simply use any character that makes sense. We can use characters like as Python variables if we save our programs in UTF-8 encoding.
This is legal Python if we save our file in UTF-8:
= 355/113
print()
It's important to be consistent when choosing between spaces and tabs in Python. They are both more or less invisible, and mixing them can easily lead to confusion. Spaces are suggested.
When we set up our editor to use a four-space indent, we can then use the button labeled Tab on our keyboard to insert four spaces. Our code will align properly, and the indentation will show how our statements nest inside each other.
The initial #!
line is a comment. Because the two characters are sometimes called sharp and bang, the combination is called "shebang." Everything between a #
and the end of the line is ignored. The Linux loader (a program named execve
) looks at the first few bytes of a file to see what the file contains. The first few bytes are sometimes called magic because the loader's behavior seems magical. When present, this two-character sequence of #!
is followed by the path to the program responsible for processing the rest of the data in the file. We prefer to use /usr/bin/env
to start the Python program for us. We can leverage this to make Python-specific environment settings via the env
program.
There's more...
The Python Standard Library documents are derived, in part, from the documentation strings present in the module files. It's common practice to write sophisticated docstrings in modules. There are tools like pydoc and Sphinx that can reformat the module docstrings into elegant documentation. We'll look at this in other recipes.
Additionally, unit test cases can be included in the docstrings. Tools like doctest
can extract examples from the document string and execute the code to see if the answers in the documentation match the answers found by running the code. Most of this book is validated with doctest.
Triple-quoted documentation strings are preferred over #
comments. While all text between #
and the end of the line is ignored, this is limited to a single line, and it is used sparingly. A docstring can be of indefinite size; they are used widely.
Prior to Python 3.6, we might sometimes see this kind of thing in a script file:
color = 355/113 # type: float
The # type: float
comment can be used by a type inferencing system to establish that the various data types can occur when the program is actually executed. For more information on this, see Python Enhancement Proposal (PEP) 484: https://www.python.org/dev/peps/pep-0484/.
The preferred style is this:
color: float = 355/113
The type hint is provided immediately after the variable name. This is based on PEP 526, https://www.python.org/dev/peps/pep-0526. In this case, the type hint is obvious and possibly redundant. The result of exact integer division is a floating-point value, and type inferencing tools like mypy
are capable of figuring out the specific type for obvious cases like this.
There's another bit of overhead that's sometimes included in a file. The VIM and gedit editors let us keep edit preferences in the file. This is called a modeline. We may see these; they can be ignored. Here's a typical modeline that's useful for Python:
# vim: tabstop=8 expandtab shiftwidth=4 softtabstop=4
This sets the Unicode u+0009
TAB characters to be transformed to eight spaces; when we hit the Tab key, we'll shift four spaces. This setting is carried in the file; we don't have to do any VIM setup to apply these settings to our Python script files.
See also
- We'll look at how to write useful document strings in the Including descriptions and documentation and Writing better RST markup in docstrings recipes.
- For more information on suggested style, see https://www.python.org/dev/peps/pep-0008/