Understanding Data from a Modeling Perspective
When attempting to understand a business problem in the context of machine learning, we need to identify whether the problem lends itself to supervised or unsupervised learning. Can the problem be solved by modeling a target variable? If yes, then is this variable available in the dataset? Is it numerical or categorical? The answers to these questions will allow us to identify which modeling algorithms will be relevant to our problem. The following diagram provides an overview of this process:
The preceding flowchart describes the various paths we can follow in order to categorize a dataset for modeling. At the first branch, we are interested in identifying whether a target variable exists. For example, in a weather forecast model, it could be a column recording the amount of rainfall historically. Or perhaps the target variable is a column labeling...