Summary
In this chapter, we learned about the main processes in conducting data analysis: data collection, data wrangling, EDA, and drawing conclusions. We followed that up with an overview of descriptive statistics and learned how to describe the central tendency and spread of our data; how to summarize it both numerically and visually using the 5-number summary, box plots, histograms, and kernel density estimates; how to scale our data; and how to quantify relationships between variables in our dataset.
We got an introduction to prediction and time series analysis. Then, we had a very brief overview of some core topics in inferential statistics that can be explored after mastering the contents of this book. Note that while all the examples in this chapter were of one or two variables, real-life data is often high-dimensional. Chapter 10, Making Better Predictions – Optimizing Models, will touch on some ways to address this. Lastly, we set up our virtual environment for this book and learned how to work with Jupyter Notebooks.
Now that we have built a strong foundation, we will start working with data in Python in the next chapter.