Applying automated regression modeling to the insurance dataset
This section demonstrates how to apply an automated machine learning solution to a slightly more complicated dataset. You will use the medical insurance cost dataset (https://www.kaggle.com/mirichoi0218/insurance) to predict how much insurance will cost based on a couple of predictor variables. You will learn how to load the dataset, perform exploratory data analysis, how to prepare it, and how to find the best machine learning pipeline with TPOT:
- As with the previous example, the first step is to load in the libraries and the dataset. We'll need
numpy
,pandas
,m
atplotlib
, andseaborn
to start with the analysis. Here's how to import the libraries and load the dataset:import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from matplotlib import rcParams rcParams['axes.spines.top'] = False rcParams['axes.spines.right'] = False df = pd.read_csv(...