Like we did in the previous chapter, we will show you how you can quickly use scikit-learn to train a linear model straight from a SageMaker notebook instance. First, you must create the notebook instance (choosing conda_python3 as the kernel).
- We will start by loading the training data into a pandas dataframe:
housing_df = pd.read_csv(SRC_PATH + 'train.csv')
housing_df.head()
The preceding code displays the following output:
- The last column, (medv), stands for median value and represents the variable that we're trying to predict (dependent variable) based on the values from the remaining columns (independent variables).
As usual, we will split the dataset for training and testing:
from sklearn.model_selection import train_test_split
housing_df_reordered = housing_df[[label] + training_features]
training_df,...