In this chapter, we talked about a typical workflow to deal with machine learning problems: how we can extract informative features from raw data, how we can use data and labels to train a machine learning model, and how we can use the finalized model to predict new data labels. We learned that it is essential to split data into a training set and test set, as this is the only way to know how well a model will generalize to new data points.
On the software side of things, we significantly improved our Python skills. We learned how to use NumPy arrays to store and manipulate data and how to use Matplotlib for data visualization. We talked about scikit-learn and its many useful data resources. Finally, we also addressed OpenCV's own TrainData container, which provides some relief for users of OpenCV's C++ API.
With these tools in hand, we are now ready to implement...