External Memory Usage
When you have an exceptionally large dataset that you can't load on to your RAM, the external memory feature of the XGBoost library will come to your rescue. This feature will train XGBoost models for you without loading the entire dataset on the RAM.
Using this feature requires minimal effort; you just need to add a cache prefix at the end of the filename.
train = xgb.DMatrix('data/wholesale-data.dat.train#train.cache')
This feature supports only libsvm file. So, we will now convert a dataset loaded in pandas into a libsvm file to be used with the external memory feature.
Note
You might have to do this in batches depending on how big your dataset is.
from sklearn.datasets import dump_svmlight_file
dump_svmlight_file(X_train, Y_train, 'data/wholesale-data.dat.train', zero_based=True, multilabel=False)
Here, X_train and Y_train are the predictor and target variables respectively. The libsvm file will get saved into the data folder.