Introduction to scikit-learn regression and classification
scikit-learn is a Python supervised and unsupervised machine learning library built on top of the numpy
and scipy
libraries.
Let's demonstrate how to forecast price changes on a dataset with RidgeCV
regression and classification using scikit-learn.
Generating the dataset
Let's start by generating the dataset for the following examples—a Pandas DataFrame containing daily data for 20 years with BookPressure
, TradePressure
, RelativeValue
, and Microstructure
fields to represent some synthetic trading signals built on this dataset (also known as features or predictors). The PriceChange
field represents the daily change in prices that we are trying to predict (also known as response or target variable). For simplicity, we make the PriceChange
field a linear function of our predictors with random weights and some random noise. The Price
field represents the actual price of the instrument generated using the...