Applying gblinear
It's challenging to find real-world datasets that work best with linear models. It's often the case that real data is messy and more complex models like tree ensembles produce better scores. In other cases, linear models may generalize better.
The success of machine learning algorithms depends on how they perform with real-world data. In the next section, we will apply gblinear
to the Diabetes dataset first and then to a linear dataset by construction.
Applying gblinear to the Diabetes dataset
The Diabetes dataset is a regression dataset of 442 diabetes patients provided by scikit-learn. The prediction columns include age, sex, BMI (body mass index), BP (blood pressure), and five serum measurements. The target column is the progression of the disease after 1 year. You can read about the dataset in the original paper here: http://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf.
Scikit-learn's datasets are already split into predictor...