Generator expressions are actually a sort of comprehension too; they compress the more advanced (this time it really is more advanced!) generator syntax into one line. The greater generator syntax looks even less object-oriented than anything we've seen, but we'll discover that once again, it is a simple syntax shortcut to create a kind of object.
Let's take the log file example a little further. If we want to delete the WARNING column from our output file (since it's redundant: this file contains only warnings), we have several options at various levels of readability. We can do it with a generator expression:
import sys
# generator expression
inname, outname = sys.argv[1:3]
with open(inname) as infile:
with open(outname, "w") as outfile:
warnings = (
l.replace("\tWARNING", "") for l in infile if "WARNING" in l
)
for l in warnings:
outfile.write(l)
That's perfectly readable, though I wouldn't want to make the expression much more complicated than that. We could also do it with a normal for loop:
with open(inname) as infile:
with open(outname, "w") as outfile:
for l in infile:
if "WARNING" in l:
outfile.write(l.replace("\tWARNING", ""))
That's clearly maintainable, but so many levels of indent in so few lines is kind of ugly. More alarmingly, if we wanted to do something other than printing the lines out, we'd have to duplicate the looping and conditional code, too.
Now let's consider a truly object-oriented solution, without any shortcuts:
class WarningFilter:
def __init__(self, insequence):
self.insequence = insequence
def __iter__(self):
return self
def __next__(self):
l = self.insequence.readline()
while l and "WARNING" not in l:
l = self.insequence.readline()
if not l:
raise StopIteration
return l.replace("\tWARNING", "")
with open(inname) as infile:
with open(outname, "w") as outfile:
filter = WarningFilter(infile)
for l in filter:
outfile.write(l)
No doubt about it: that is so ugly and difficult to read that you may not even be able to tell what's going on. We created an object that takes a file object as input, and provides a __next__ method like any iterator.
This __next__ method reads lines from the file, discarding them if they are not WARNING lines. When we encounter a WARNING line, we modify and return it. Then our for loop calls __next__ again to process the subsequent WARNING line. When we run out of lines, we raise StopIteration to tell the loop we're finished iterating. It's pretty ugly compared to the other examples, but it's also powerful; now that we have a class in our hands, we can do whatever we want with it.
With that background behind us, we finally get to see true generators in action. This next example does exactly the same thing as the previous one: it creates an object with a __next__ method that raises StopIteration when it's out of inputs:
def warnings_filter(insequence):
for l in insequence:
if "WARNING" in l:
yield l.replace("\tWARNING", "")
with open(inname) as infile:
with open(outname, "w") as outfile:
filter = warnings_filter(infile)
for l in filter:
outfile.write(l)
OK, that's pretty readable, maybe... at least it's short. But what on earth is going on here? It makes no sense whatsoever. And what is yield, anyway?
In fact, yield is the key to generators. When Python sees yield in a function, it takes that function and wraps it up in an object not unlike the one in our previous example. Think of the yield statement as similar to the return statement; it exits the function and returns a line. Unlike return, however, when the function is called again (via next()), it will start where it left off—on the line after the yield statement—instead of at the beginning of the function. In this example, there is no line after the yield statement, so it jumps to the next iteration of the for loop. Since the yield statement is inside an if statement, it only yields lines that contain WARNING.
While it looks like this is just a function looping over the lines, it is actually creating a special type of object, a generator object:
>>> print(warnings_filter([])) <generator object warnings_filter at 0xb728c6bc>
I passed an empty list into the function to act as an iterator. All the function does is create and return a generator object. That object has __iter__ and __next__ methods on it, just like the one we created in the previous example. (You can call the dir built-in function on it to confirm.) Whenever __next__ is called, the generator runs the function until it finds a yield statement. It then returns the value from yield, and the next time __next__ is called, it picks up where it left off.
This use of generators isn't that advanced, but if you don't realize the function is creating an object, it can seem like magic. This example was quite simple, but you can get really powerful effects by making multiple calls to yield in a single function; on each loop, the generator will simply pick up at the most recent yield and continue to the next one.