Use case and data overview
To demonstrate the fundamental workflow, we will implement a binary classification problem where we predict the likelihood that a loan will default or not. The dataset we use in this chapter can be found at https://github.com/PacktPublishing/Machine-Learning-at-Scale-with-H2O/blob/main/chapt3/loans-lite.csv. (This is a simplified version of the Kaggle Lending Club Loan dataset: https://www.kaggle.com/imsparsh/lending-club-loan-dataset-2007-2011.)Â Â
We are using a simplified version of the dataset to streamline the workflow in this chapter. In Part 2, Building State-of-the-Art Models at Scale, we will develop this use case using advanced H2O model-building capabilities on the original loan dataset.