Next, we will combine the training (grp=1) and testing (grp=0) datasets into one dataframe and manually calculate some accuracy statistics:
- preds$error: this is the absolute difference between the outcome (0,1) and the prediction. Recall that for a binary regression model, the prediction represents the probability that the event (diabetes) will occur.
- preds$errorsqr: this is the calculated squared error. This is done in order to remove the sign.
- preds$correct: in order to classify the probability into correct or not correct, we will compare the error to a .5 cutoff. If the error was small (<- .5) we will call it correct, otherwise it will be considered not correct. This is a somewhat arbitrary cutoff, and it is used to determine which category to place the prediction in.
As a final step, we will once again separate the data back into...