Cloning iterators with tee()
The
tee()
function gives us a way to circumvent one of the important Python rules for working with iterables. The rule is so important, we'll repeat it here.
Note
Iterators can be used only once.
The tee()
function allows us to clone an iterator. This seems to free us from having to materialize a sequence so that we can make multiple passes over the data. For example, a simple average for an immense dataset could be written in the following way:
def mean(iterator): it0, it1= tee(iterator,2) s0= sum(1 for x in it0) s1= sum(x for x in it1) return s0/s1
This would compute an average without appearing to materialize the entire dataset in memory in any form.
While interesting in principle, the tee()
function's implementation suffers from a severe limitation. In most Python implementations, the cloning is done by materializing a sequence. While this circumvents the "one time only" rule for small collections, it doesn't work out well for immense collections...