Creating a Data Science Pipeline
OSEMN is one of the most common data science pipelines used for approaching any kind of data science problem. It's pronounced awesome.
OSEMN stands for the following:
Obtaining the data, which can be from any source, structured, unstructured, or semi-structured.
Scrubbing the data, which is getting your hands dirty and cleaning the data, which can involve renaming columns and imputing missing values.
Exploring the data to find out the relationships between each of the variables. Searching for any correlation among the variables. Finding the relationship between the explanatory variables and the response variable.
Modeling the data, which can include prediction, forecasting, and clustering.
INterpreting the data, which is combining all the analyses and results to draw a conclusion.
Obtaining the Data
This step refers to collecting data. Data can be obtained from a single source or from multiple sources. In the real world, collecting data is not always easy since...