Task 3 – Calculating the average length of words in a stream
In this task, we will investigate how we can use CombineFn
and accumulators to compute a directly non-combinable reduction and average. Let's see how this works.
Defining the problem
Given an input data stream of lines of text, calculate the average length of words currently seen in this stream. Output the current average as frequently as possible, ideally after every word.
Discussing the problem decomposition
Calculating an average is not a directly combinable function. An average of averages is not a proper average of the original data. However, we can calculate an average using an accumulator. An accumulator would be a pair of (sum, count) and the output will be extracted using a function that divides the sum by the count. We can illustrate this with Figure 2.9:
We will need to create an accumulator object for...