Data modeling is the process of using data to build predictive models. Data can also be used for descriptive and prescriptive analysis. But before we make use of data, it has to be fetched from several sources, stored, assimilated, cleaned, and engineered to suit our goal. The sequential operations that need to be performed on data are akin to a manufacturing pipeline, where each subsequent step adds value to the potential end product and each progression requires a new person or skill set.
The various steps in a data analytics pipeline are shown in the following diagram:
Steps in data analytics pipeline
- Extract Data
- Transform Data
- Load Data
- Read & Process Data
- Exploratory Data Analysis
- Create Features
- Build Predictive Models
- Validate Models
- Build Products
These steps can be combined into three high-level categories: data engineering, data science...