Defining a Sample Use Case
For the purpose of exploring topics in this chapter with a practical dataset, we use a small dataset already available in the mlbench package, called PimaIndiansDiabetes, which is a handy dataset for classification use cases.
The dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The use case that can be tailored from the dataset is when predicting if a patient has diabetes as a function of few medical diagnostic measurements.
Note
Additional information can be found at http://math.furman.edu/~dcs/courses/math47/R/library/mlbench/html/PimaIndiansDiabetes.html.
The selection of the use case with a dataset size of less than 1000 rows is intentional. The topics explored in this chapter require high computation time on commodity hardware for regular use cases with large datasets. The selection of small datasets for the purpose of demonstration helps in achieving the outcome with fairly normal computational time for most readers...