Case Study
For this case study, we are going to help predict the likelihood of a person being diagnosed with diabetes based on several diagnostic measurements of that person.
The dataset that you will be using in this chapter is from this database: https://www.kaggle.com/uciml/pima‐indians‐diabetes‐database
. This dataset contains several medical independent predictors and one target. Its features consist of the following:
- Pregnancies: Number of times pregnant
- Glucose: Plasma glucose concentration after 2 hours in an oral glucose tolerance test
- BloodPressure: Diastolic blood pressure (mm Hg)
- SkinThickness: Triceps skin fold thickness (mm)
- Insulin: 2‐Hour serum insulin (mu U/ml)
- BMI: Body mass index (weight in kg/(height in m)^2)
- DiabetesPedigreeFunction: Diabetes pedigree function
- Age: Age (years)
- Outcome: 0 (non‐diabetic) or 1 (diabetic)
The dataset has 768 records, and all patients are females at least 21 years of age and of Pima Indian descent.