Least-squares curves with NumPy and SciPy
We will now learn how to fit curves to a dataset. For this section, we will investigate the relationship between horsepower
and mpg
for a vehicle. From Figure 10.1, we know that the relationship between these two variables is not linear; hence, we will use power 2 of our feature variable X as an input to the model. This is called polynomial regression. Here, we are using a linear model to fit a non-linear dataset.
Here's how we will import the required Python packages and select the X and Y of interest from the pandas data frame, df
:
import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression #Importing the dataset as a pandas dataframe df = pd.read_csv("auto_dataset.csv") #Selecting the variables of interest X = df["horsepower"] y = df["mpg"] #Converting the series...