Now that we understand how linear regression works, let's move on to looking at a real dataset where we can demonstrate a more practical use case.
The Boston dataset is a small set representing the house prices in the city of Boston. It contains 506 samples and 13 features. Let's load the data into a DataFrame, as follows:
from sklearn.datasets import load_boston
boston = load_boston()
df_dataset = pd.DataFrame(
boston.data,
columns=boston.feature_names,
)
df_dataset['target'] = boston.target
Data exploration
It's important to make sure you do not have any null values in your data; otherwise, scikit-learn will complain about it. Here, I will count the sum of the null values in each column, then take the sum of it. If I get 0, then I am a happy man:
df_dataset.isnull().sum().sum() # Luckily, the result is zero
For a regression problem, the most important thing to do is to...