Exploring the dataset
There is no reason to go wild with the dataset. Just because we can train neural network models with TPOT doesn't mean we should spend 50+ pages exploring and transforming needlessly complex datasets.
For that reason, you'll use a scikit-learn built-in dataset throughout the chapter – the Breast cancer dataset. This dataset doesn't have to be downloaded from the web as it comes built-in with scikit-learn. Let's start by loading and exploring it:
- To begin, you'll need to load in a couple of libraries. We're importing NumPy, pandas, Matplotlib, and Seaborn for easy data analysis and visualization. Also, we're importing the
load_breast_cancer
function from thesklearn.datasets
module. That's the function that will load in the dataset. Finally, thercParams
module is imported from Matplotlib to make default styling a bit easier on the eyes:import numpy as np import pandas as pd import matplotlib.pyplot as plt...