Implementing logistic regression
For this recipe, we will implement logistic regression to predict the probability of breast cancer using the Breast Cancer Wisconsin dataset (https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)). We will be predicting the diagnosis from features that are computed from a digitized image of a fine needle aspiration (FNA) of a breast mass. An FNA is a common breast cancer test, consisting of a small tissue biopsy that can be examined under a microscope.
The dataset can immediately be used for a classification model, without further transformations, since the target variable consists of 357 benign cases and 212 malignant ones. The two classes do not have the exact same consistency (an important requirement when doing binary classification with regression models), but they are not extremely different, allowing us to build a straightforward example and evaluate it using plain accuracy.
Please remember to check whether...