KNN for binary classification
The KNN algorithm has some of the same advantages as the decision tree algorithm. No prior assumptions about the distribution of features or residuals have to be met. It is a suitable algorithm for the heart disease model we tried to build in the last two chapters. The dataset is not very large (30,000 observations) and does not have too many features.
Note
The heart disease dataset is available for public download at https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease. It is derived from the United States Center for Disease Control survey data on more than 400,000 individuals from 2020. I have randomly sampled 30,000 observations from this dataset for the analysis in this section. Data columns include whether respondents ever had heart disease, body mass index, smoking history, heavy alcohol drinking, age, diabetes, and kidney disease.
Let’s get started with our model:
- First, we must load some of...