Exploring breast cancer traits using Decision Trees
One of the first problems that we have when we receive a dataset is deciding what to start analyzing. At the very beginning, there is quite often a feeling of loss about what to do first. Here, we will present an exploratory approach based on Decision Trees. The big advantage of Decision Trees is that they will give us the rules that constructed the decision tree, allowing us a first tentative understanding of what is going on with our data.
In this example, we will be using a dataset with trait observations from patients with breast cancer. The dataset with 699 data entries includes information such as clump thickness, uniformity of cell size, or type of chromatin. The outcome is either a benign or malignant tumor. The features are encoded with values from 0 to 10. More information about the project can be found at http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29.
Getting ready
We are going...