The dataset used depicts a real-life application of Decision Trees in machine learning. We used a cancer dataset to predict what makes a patient's case malignant or not. To explore the real power of decision trees, we use a medical dataset that exhibits real life non-linearity with a complex error surface.
Getting and preparing real-world medical data for exploring Decision Trees and Ensemble models in Spark 2.0
How to do it...
The Wisconsin Breast Cancer dataset was obtained from the University of Wisconsin Hospital from Dr. William H Wolberg. The dataset was gained periodically as Dr. Wolberg reported his clinical cases.
The dataset can be retrieved from multiple sources, and is available directly from the University...