Performing a simple ML model with Python
In this section, we create a simple ML model in Python. Python has grown to be the primary go-to language for ML work (with R as the obvious alternative) and the number of packages implementing ML algorithms is difficult to overestimate. Having said that, sklearn
remains the most widely used so we will also choose it for this section. Similarly to the R part of the chapter, we will use the xgboost
model because it has a great balance between performance and explainability.
We will use the data loaded in the previous section.
Data preprocessing
The first thing to do for the modeling phase is to prepare the data. Fortunately, sklearn
comes with a preprocessing functionality built-in!
Let’s review the steps involved in data preprocessing:
- Handling missing values: Before training a model, it’s essential to address missing values in the dataset.
sklearn
provides methods for imputing missing values or removing rows...