Cloning iterators with tee()
The tee()
function gives us a way to circumvent one of the important Python rules for working with iterables. The rule is so important, we'll repeat it here:
Note
Iterators can be used only once.
The tee()
function allows us to clone an iterator. This seems to free us from having to materialize a sequence so that we can make multiple passes over the data. For example, a simple average for an immense dataset could be written in the following way:
def mean(iterator: Iterator[float]) -> float:
it0, it1 = tee(iterator,2)
N = sum(1 for x in it0)
s1 = sum(x for x in it1)
return s1/N
This would compute an average without appearing to materialize the entire dataset in memory in any form. Note that the type hint of float
doesn't preclude integers. The mypy program is aware of the type coercion rules, and this definition provides a flexible way to specify that either int
or float
will work.
While interesting in principle, the tee()
function's implementation...