Summary
In this chapter, we saw an overview of all the steps involved in making a custom ML pipeline. You might have seen familiar concepts for data preprocessing or analytics and learned an important lesson. Data experimentation is a step-by-step approach rather than an experimental process. Look for missing values, data distribution, and relationships between features and targets. This analysis will greatly help you to understand which preprocessing steps to perform and what model performance to expect.
You now know that data preprocessing, or feature engineering, is the most important part of the whole ML process. The more prior knowledge you have about the data, the better you can encode categorical and temporal variables or transform text to numerical space using NLP techniques. You learned that choosing the proper ML task, model, error metric, and train-test split is mostly defined by business decisions (for example, object detection against segmentation) or a performance...