Regression
With the planets dataset, we want to predict the length of the year, which is a numeric value, so we will turn to regression. As mentioned at the beginning of this chapter, regression is a technique for modeling the strength and magnitude of the relationship between independent variables (our X
data)—often called regressors—and the dependent variable (our y
data) that we want to predict.
Linear regression
Scikit-learn provides many algorithms that can handle regression tasks, ranging from decision trees to linear regression, spread across modules according to the various algorithm classes. However, typically, the best starting point is a linear regression, which can be found in the linear_model
module. In simple linear regression, we fit our data to a line of the following form:
Here, epsilon (ε) is the error term and betas (β) are coefficients.
Important note
The coefficients we get from our model are those...