2.6 Statistical modeling
The point of data analysis is to digest raw data and present information to people to support their decision-making. The previous stages of the pipeline have prepared two important things:
Raw data has been cleaned and standardized to provide data that are relatively easy to analyze.
The process of inspecting and summarizing the data has helped analysts, developers, and, ultimately, users understand what the information means.
The confluence of data and deeper meaning creates significant value for an enterprise. The analysis process can continue as more formalized statistical modeling. This, in turn, may lead to artificial intelligence (AI) and machine learning (ML) applications.
The processing pipeline includes these projects to gather summaries of individual variables as well as combinations of variables:
Project 5.1: ”Statistical Model: Core Processing”. This project builds the base application for applying statistical models and saving parameters about the data. This will focus on summaries like mean, median, mode, and variance.
Project 5.2: ”Statistical Model: Relationships”. It’s common to want to know the relationships among variables. This includes measures like correlation among variables.
This sequence of stages produces high-quality data and provides ways to diagnose and debug problems with data sources. The sequence of projects will illustrate how automated solutions and interactive inspection can be used to create useful, timely, insightful reporting and analysis.