Statistics and data science
The British mathematician Karl Pearson once stated, “Statistics is the grammar of science.”
If you’re starting your journey of leading data science, ML, or AI initiatives within your organization, or just working with data scientists and ML engineers, having a foundation in statistical knowledge is essential.
Having a foundation in statistical knowledge is crucial for individuals embarking on a journey into leading projects or teams within the field of data science. It enables them to gain a competitive advantage in extracting valuable insights from data. Statistics plays a vital role as it offers various tools and techniques to identify patterns and uncover deeper insights from the available data. A good grasp of statistics allows individuals to think critically, approach problem-solving creatively, and make data-driven decisions. In this section, we aim to cover essential statistical topics that are relevant to data science.
What is statistics?
Before going further, it will be helpful to define what we mean by statistics as the term can be used in several different ways. It can be used to do the following:
- Indicate the whole discipline of statistics
- Refer to the methods that are used to collect, process, and interpret quantitative data
- Refer to collections of gathered data
- Refer to calculated figures (such as the mean) that are used to interpret the data that’s been gathered
In this case, we define statistics using the second definition – the methods that are used to collect, process, and interpret quantitative data.
Today, few industries are untouched by statistical thinking. For example, within market research, statistics is used when sampling surveys and comparing results between groups to understand which insights are statistically significant; within life sciences, statistics is used to measure and evaluate the efficacy of pharmaceuticals; and within financial services, statistics is used to model and understand risk.
I’m sure you’re familiar with many of these and other applications of statistics, and you may have studied statistics before at school, college, or in your professional career, and much of what follows in this chapter may not be brand new information. Even if this is the case, it can be useful to have a refresher as unfortunately, it’s not possible to pause a career to complete a statistics course.
When you’re leading data science, ML, or AI initiatives, understanding statistics is an essential skill, whether you’re working with simple statistical models or understanding the data being used or a model’s performance when you’re training and evaluating deep learning AI models.
With this in mind, let’s dive into some of the core concepts within probability and statistics.