Let's dive into the dataset to understand the kind of data we are working with. We import the dataset into pandas:
import pandas as pd
df = pd.read_csv('diabetes.csv')
Let's take a quick look at the first five rows of the dataset by calling the df.head() command:
print(df.head())
We get the following output:
It looks like there are nine columns in the dataset, which are as follows:
- Pregnancies: Number of previous pregnancies
- Glucose: Plasma glucose concentration
- BloodPressure: Diastolic blood pressure
- SkinThickness: Skin fold thickness measured from the triceps
- Insulin : Blood serum insulin concentration
- BMI: Body mass index
- DiabetesPedigreeFunction: A summarized score that indicates the genetic predisposition of the patient for diabetes, as extrapolated from the patient's family record for diabetes
- Age: Age in years
- Outcome...