Now that we have built our first decision tree, it's time to turn our attention to a real dataset: the Breast Cancer Wisconsin dataset (https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)).
This dataset is a direct result of medical imaging research and is considered a classic today. The dataset was created from digitized images of healthy (benign) and cancerous (malignant) tissues. Unfortunately, I wasn't able to find any public-domain examples from the original study, but the images look similar to the following screenshot:
The goal of the research was to classify tissue samples into benign and malignant (a binary classification task).
To make the classification task feasible, the researchers performed feature extraction on the images, as we did in Chapter 4, Representing Data and Engineering...