Predicting heart disease
We'll put logistic regression for the binary classification task to the test with a real-world dataset from the UCI Machine Learning Repository. This time, we will be working with the Statlog (Heart) dataset, which we will refer to as the heart dataset henceforth for brevity. The dataset can be downloaded from the UCI Machine Repository's website at http://archive.ics.uci.edu/ml/datasets/Statlog+%28Heart%29. The data contains 270 observations for patients with potential heart problems. Of these, 120 patients were shown to have heart problems, so the split between the two classes is fairly even. The task is to predict whether a patient has a heart disease based on their profile and a series of medical tests. First, we'll load the data into a data frame and rename the columns according to the website:
> heart <- read.table("heart.dat", quote = "\"") > names(heart) <- c("AGE", "SEX", "CHESTPAIN", "RESTBP", "CHOL", "SUGAR", "ECG", "MAXHR", "ANGINA", "DEP"...