Summary
In this first chapter, we've learned about summary statistics and the value of distributions. We've seen how even a simple analysis can provide evidence of potentially fraudulent activity.
In particular, we've encountered the central limit theorem and seen why it goes such a long way towards explaining the ubiquity of the normal distribution throughout data science. An appropriate distribution can represent the essence of a large sequence of numbers in just a few statistics and we've implemented several of them using pure Clojure functions in this chapter. We've also introduced the Incanter library and used it to load, transform, and visually compare several datasets. We haven't been able to do much more than note a curious difference between two distributions, however.
In the next chapter, we'll extend what we've learned about descriptive statistics to cover inferential statistics. These will allow us to quantify a measured difference between two or more distributions and decide whether a difference is statistically significant. We'll also learn about hypothesis testing—a framework for conducting robust experiments that allow us to draw conclusions from data.