Studying data via stream statistics
While Kettle's forte is extracting, manipulating, and loading data, there is an entire set of tools built for generating statistics and analytic style data from the data stream. This recipe will focus on several of those tools that will allow for even more insight into your data. Kettle treats the data worked on in transformations as a stream going from an input to an output. The tools discussed in this recipe will show how to learn more about the data stream through gathering statistics about the data for analysis.
Getting ready
This recipe will not be a single large process, but made up of smaller recipes around the same subject. We will be using the Baseball salary dataset that can be found on the book's website or from Lahman's Baseball Archive website, found at http://www.seanlahman.com/baseball-archive/statistics/. The code for this recipe can also be found on the book's website.
The recipe will be broken into smaller recipes that will focus on three...