You're reading from Data Cleaning and Exploration with Machine Learning Get to grips with machine learning techniques to achieve sparkling-clean data quickly

Product type Paperback

Published in Aug 2022

Publisher Packt

ISBN-13 9781803241678

Length 542 pages

Edition 1st Edition

Concepts

Machine Learning

Author (1):

Michael Walker

View More author details

Table of Contents (23) Chapters

Preface

1. Section 1 – Data Cleaning and Machine Learning Algorithms

2. Chapter 1: Examining the Distribution of Features and Targets FREE CHAPTER

3. Chapter 2: Examining Bivariate and Multivariate Relationships between Features and Targets

4. Chapter 3: Identifying and Fixing Missing Values

5. Section 2 – Preprocessing, Feature Selection, and Sampling

6. Chapter 4: Encoding, Transforming, and Scaling Features

7. Chapter 5: Feature Selection

8. Chapter 6: Preparing for Model Evaluation

9. Section 3 – Modeling Continuous Targets with Supervised Learning

10. Chapter 7: Linear Regression Models

11. Chapter 8: Support Vector Regression

12. Chapter 9: K-Nearest Neighbors, Decision Tree, Random Forest, and Gradient Boosted Regression

13. Section 4 – Modeling Dichotomous and Multiclass Targets with Supervised Learning

14. Chapter 10: Logistic Regression

15. Chapter 11: Decision Trees and Random Forest Classification

16. Chapter 12: K-Nearest Neighbors for Classification

17. Chapter 13: Support Vector Machine Classification

18. Chapter 14: Naïve Bayes Classification

19. Section 5 – Clustering and Dimensionality Reduction with Unsupervised Learning

20. Chapter 15: Principal Component Analysis

21. Chapter 16: K-Means and DBSCAN Clustering

22. Other Books You May Enjoy

Decision tree and random forest regression

We will use a decision tree and a random forest in this section to build a regression model with the same income gap data we worked with earlier in this chapter. We will also use tuning to identify the hyperparameters that give us the best-performing model, just as we did with KNN regression. Let’s get started:

We must load many of the same libraries as we did with KNN regression, plus DecisionTreeRegressor and RandomForestRegressor from scikit-learn:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import RandomizedSearchCV
from sklearn.tree import DecisionTreeRegressor, plot_tree
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import SelectFromModel