Getting and preparing real-world medical data for exploring Decision Trees and Ensemble models in Spark 2.0
The dataset used depicts a real-life application of Decision in machine learning. We used a cancer dataset to predict what makes a patient's case malignant or not. To explore the real power of decision trees, we use a medical dataset that exhibits real life non-linearity a complex error surface.
How to do it...
The Wisconsin Breast Cancer dataset was from the University of Wisconsin Hospital from Dr. William H Wolberg. The dataset was gained periodically as Dr. Wolberg reported his clinical cases.
The dataset can be retrieved from multiple sources, and is available directly from the University of California Irvine's web server http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data
The data is also available from the University of Wisconsin's web server ftp://ftp.cs.wisc.edu/math-prog/cpo-dataset/machine-learn/cancer/cancer1/datacum...