Building machine learning applications, while similar in many respects to the standard engineering paradigm, differs in one crucial aspect: the need to work with data as a raw material. The success of your project will, in large part, depend on the quality of the data you acquire, as well as your handling of that data. And because working with data falls into the domain of data science, it is helpful to understand the data science workflow:
The process involves these six steps in the following order:
- Acquisition
- Inspection
- Preparation
- Modeling
- Evaluation
- Deployment
Frequently, there is a need to circle back to prior steps, such as when inspecting and preparing the data, or when evaluating and modeling, but the process at a high level can be as described in the preceding list.
Let's now discuss each step in detail.