In this recipe, we download and inspect the Pima Diabetes dataset from the UCI machine learning repository. We will use the dataset later with Spark's streaming logistic regression algorithm.
Downloading Pima Diabetes data for supervised classification
How to do it...
You will need one of the following command-line tools curl or wget to retrieve the specified data:
- You can start by downloading the dataset using either two of the following commands. The first command is as follows:
http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data
This is an alternative that you can use:
wget http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians...