Analyzing many variables in one pass
In many cases, we'll have data with multiple variables that we'd like to analyze. The data can be visualized as filling in a grid, with each row containing a specific outcome. Each outcome row has multiple variables in columns. Many recipes in this chapter have a very narrow grid with only two variables, x
and y
. Two recipes earlier in this chapter, Computing an autocorrelation, and Confirming that the data is random – the null hypothesis, have relied on data with more than two variables.
For many of these recipes, we have followed a pattern of treating the data as if it is provided in column-major order: the recipe processed each variable (from a column of data) independently. This leads to visiting each row of data multiple times. For a large number of rows, this can become inefficient.
The alternative is to follow a pattern of row-major order. This means processing all the variables at once for each row of data. This...