Summary
In this chapter, we dove into the core fundamentals of data mining with statistics, which are often assessed during data science interviews. We reviewed the basics of probability, how to describe data using different measures of centrality and variability, how to estimate variables with population sampling, the relevance of the CLT and the assumption of normality, and reviewed probability distributions and hypothesis testing. By learning about these principles, you will be able to identify and describe relevant data statistics and make testable hypotheses. You will also avoid being fooled by misused statistics that manipulate our understanding of data.
Be aware that some interviewers will ask theoretical questions while others will want you to work out the solution to a problem. In either case, statistics is the backbone of many machine learning algorithms and experimentation designs, which are prominent in data science in all industries.
In the next chapter, we will...