Summary
This chapter served as an introduction to machine learning in Python. We discussed the terminology that's commonly used to describe learning types and tasks. Then, we practiced EDA using the skills we learned throughout this book to get a feel for the wine and planet datasets. This gave us some ideas about what kinds of models we would want to build. A thorough exploration of the data is essential before attempting to build a model.
Next, we learned how to prepare our data for use in machine learning models and the importance of splitting the data into training and testing sets before modeling. In order to prepare our data efficiently, we used pipelines in scikit-learn
to package up everything from our preprocessing through our model.
We used unsupervised k-means to cluster the planets using their semi-major axis and period; we also discussed how to use the elbow point method to find a good value for k. Then, we moved on to supervised learning and made a linear regression...