Summary
This chapter has been a long one. The effort will be worthwhile. Random variation is a component of any dataset, so knowing how to characterize and describe that random variation when analyzing data is a key skill for any data scientist. In this chapter, we have learned the following:
- How and why randomness arises in data
- How random variables are a natural concept to describe randomness in data
- Key aspects of random variables, such as their probability distributions, and how to use key metrics such as the mean and variance of a distribution to characterize a distribution
- How we can think of datasets as being samples drawn from an underlying distribution, and it is the underlying distribution we are really interested in understanding
- How to summarize a sample using the sample mean and sample variance
- How sample characteristics, such as the sample mean and sample variance, can be related back to the corresponding quantities of the underlying population...