In this section, we will utilize pandas to do some analysis and preprocessing of the data before submitting it as input to scikit-learn.
Data analysis and preprocessing using pandas
Examining the data
To start our preprocessing of the data, let's read in the training dataset and examine what it looks like.
Here, we read the training dataset into a pandas DataFrame and display the first rows:
In [2]: import pandas as pd import numpy as np # For .read_csv, always use header=0 when you know row 0 is the header row train_df = pd.read_csv('csv/train.csv', header=0) In [3]: train_df.head(3)
The output is as follows:
Hence, we can see the various features...