Summary
This chapter demonstrated the importance of data preparation. Because the tools and algorithms used to build machine learning models are the same across projects, data preparation is a key that unlocks the highest levels of model performance. This allows some aspects of human intelligence and creativity to have a large impact on the machine’s learning process, although clever practitioners use their strengths in concert with the machine’s by developing automated data engineering pipelines that take advantage of the computer’s ability to tirelessly search for useful insights in the data. These pipelines are especially important in the so-called “big data regime,” where data-hungry approaches like deep learning must be fed large amounts of data to avoid overfitting.
In traditional small and medium data regimes, feature engineering by hand still reigns supreme. Using intuition and subject matter expertise, one can guide the model to the...