Summary
In this chapter, we focused on the steps that come before training machine learning models. We discussed how to plan a machine learning strategy and learned about various hands-on methods we can use to prepare a dataset for modeling.
Starting with a high-level view, we focused on approaching data science problems by looking at available data, determining business needs, and assessing the data for suitability. Next, we discussed how to understand data from a modeling perspective, such as being able to identify whether datasets lend themselves to supervised or unsupervised learning problems.
Having covered these big-picture ideas, we paid particular attention to data preparation, which should be performed prior to modeling. We saw how to merge datasets, drop or fill missing values, transform categorical features, and split datasets into training and testing sets.
Finally, we introduced the Human Resource Analytics dataset and put what we learned into practice by cleaning...