Avoiding a potential problem with break statements
The common way to understand a for
statement is that it creates a for all condition. At the end of the statement, we can assert that, for all items in a collection, some processing has been done.
This isn't the only meaning for a for
statement. When we introduce the break
statement inside the body of a for
, we change the semantics to there exists. When the break
statement leaves the for
(or while
) statement, we can assert only that there exists at least one item that caused the statement to end.
There's a side issue here. What if the for
statement ends without executing break
? Either way, we're at the statement after the for
statement.
The condition that's true upon leaving a for
or while
statement with a break
can be ambiguous. Did it end normally? Did it execute break
? We can't easily tell, so we'll provide a recipe that gives us some design guidance.
This can become an even bigger problem when we have multiple break
statements, each with its own condition. How can we minimize the problems created by having complex conditions?
Getting ready
When parsing configuration files, we often need to find the first occurrence of a :
or =
character in a string. This is common when looking for lines that have a similar syntax to assignment statements, for example, option = value
or option : value
. The properties file format uses lines where :
(or =
) separate the property name from the property value.
This is a good example of a there exists modification to a for
statement. We don't want to process all characters; we want to know where there is the leftmost :
or =
.
Here's the sample data we're going use as an example:
>>> sample_1 = "some_name = the_value"
Here's a small for
statement to locate the leftmost "="
or ":"
character in the sample string value:
>>> for position in range(len(sample_1)):
... if sample_1[position] in '=:':
... break
>>> print(f"name={sample_1[:position]!r}",
... f"value={sample_1[position+1:]!r}")
name='some_name ' value=' the_value'
When the "="
character is found, the break
statement stops the for
statement. The value of the position
variable shows where the desired character was found.
What about this edge case?
>>> sample_2 = "name_only"
>>> for position in range(len(sample_2)):
... if sample_2[position] in '=:':
... break
>>> print(f"name={sample_2[:position]!r}",
... f"value={sample_2[position+1:]!r}")
name='name_onl' value=''
The result is awkwardly wrong: the y
character got dropped from the value of name
. Why did this happen? And, more importantly, how can we make the condition at the end of the for
statement more clear?
How to do it...
Every statement establishes a post condition. When designing a for
or while
statement, we need to articulate the condition that's true at the end of the statement. In this case, the post condition of the for
statement is quite complicated.
Ideally, the post condition is something simple like text[position] in '=:'
. In other words, the value of position
is the location of the "="
or ":"
character. However, if there's no =
or :
in the given text, the overly simple post condition can't be true. At the end of the for
statement, one of two things are true: either (a) the character with the index of position
is "="
or ":"
, or (b) all characters have been examined and no character is "="
or ":"
.
Our application code needs to handle both cases. It helps to carefully articulate all of the relevant conditions.
- Write the obvious post condition. We sometimes call this the happy-path condition because it's the one that's true when nothing unusual has happened:
text[position] in '=:'
- Create the overall post condition by adding the conditions for the edge cases. In this example, we have two additional conditions:
- There's no
=
or:
. - There are no characters at all.
len()
is zero, and thefor
statement never actually does anything. In this case, theposition
variable will never be created. In this example, we have three conditions:(len(text) == 0 or not('=' in text or ':' in text) or text[position] in '=:')
- There's no
- If a
while
statement is being used, consider redesigning it to have the overall post condition in thewhile
clause. This can eliminate the need for abreak
statement. - If a
for
statement is being used, be sure a proper initialization is done, and add the various terminating conditions to the statements after the loop. It can look redundant to havex = 0
followed byfor x = ...
. It's necessary in the case of afor
statement that doesn't execute thebreak
statement. Here's the resultingfor
statement and a complicatedif
statement to examine all of the possible post conditions:>>> position = -1 >>> for position in range(len(sample_2)): ... if sample_2[position] in '=:': ... break ... >>> if position == -1: ... print(f"name=None value=None") ... elif not(sample_2[position] == ':' or sample_2[position] == '='): ... print(f"name={sample_2!r} value=None") ... else: ... print(f"name={sample_2[:position]!r}", ... f"value={sample_2[position+1:]!r}") name= name_only value= None
In the statements after the for
, we've enumerated all of the terminating conditions explicitly. If the position found is -1
, then the for
loop did not process any characters. If the position is not the expected character, then all the characters were examined. The third case is one of the expected characters were found. The final output, name='name_only' value=None
, confirms that we've correctly processed the sample text.
How it works...
This approach forces us to work out the post condition carefully so that we can be absolutely sure that we know all the reasons for the loop terminating.
In more complex, nested for
and while
statements—with multiple break
statements—the post condition can be difficult to work out fully. A for
statement's post condition must include all of the reasons for leaving the loop: the normal reasons plus all of the break
conditions.
In many cases, we can refactor the for
statement. Rather than simply asserting that position
is the index of the =
or :
character, we include the next processing steps of assigning substrings to the name
and value
variables. We might have something like this:
>>> if len(sample_2) > 0:
... name, value = sample_2, None
... else:
... name, value = None, None
>>> for position in range(len(sample_2)):
... if sample_2[position] in '=:':
... name, value = sample_2[:position], sample2[position:]
... break
>>> print(f"{name=} {value=}")
name='name_only' value=None
This version pushes some of the processing forward, based on the complete set of post conditions evaluated previously. The initial values for the name
and value
variables reflect the two edge cases: there's no =
or :
in the data or there's no data at all. Inside the for
statement, the name
and value
variables are set prior to the break
statement, assuring a consistent post condition.
The idea here is to forego any assumptions or intuition. With a little bit of discipline, we can be sure of the post conditions. The more we think about post conditions, the more precise our software can be. It's imperative to be explicit about the condition that's true when our software works. This is the goal for our software, and you can work backward from the goal by choosing the simplest statements that will make the goal conditions true.
There's more...
We can also use an else
clause on a for
statement to determine if the statement finished normally or a break
statement was executed. We can use something like this:
>>> for position in range(len(sample_2)):
... if sample_2[position] in '=:':
... name, value = sample_2[:position], sample_2[position+1:]
... break
... else:
... if len(sample_2) > 0:
... name, value = sample_2, None
... else:
... name, value = None, None
>>> print(f"{name=} {value=}")
name='name_only' value=None
Using
an
else
clause
in
a
for
statement is sometimes confusing, and we don't recommend it. It's not clear if its version is substantially better than any of the alternatives. It's too easy to forget the reason why else
is executed because it's used so rarely.
See also
- A classic article on this topic is by David Gries, A note on a standard strategy for developing loop invariants and loops. See http://www.sciencedirect.com/science/article/pii/0167642383900151