Building a baseline model
From the original temporal data, through feature engineering, we generated time-aggregated features for each time segment in the training data, equal in duration with one test set. For the baseline model demonstrated in this competition, we chose LGBMRegressor
, one of the best-performing algorithms at the time of the competition, which, in many cases, had a similar performance to XGBoost
. The training data is split using KFold
into five splits, and we run training and validation for each fold until we reach the final number of iterations or when the validation error ceases to improve after a specified number of steps (given by the patience parameter). For each split, we then also run the prediction for the test set, with the best model – trained with the current train split for the current fold, that is, with 4/5 from the training set. At the end, we will work out the average of the predictions obtained for each fold. We can use this cross-validation...