Downloading wine quality data for streaming regression
In this recipe, we download and inspect the wine quality dataset from the UCI machine learning repository to prepare data for Spark's streaming linear regression algorithm from MLlib.
How to do it...
You will need one of the following command-line tools curl
or wget
to retrieve specified data:
- You can start by downloading the dataset using either of the following three commands. The first one is as follows:
wget http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
You can also use the following command:
curl http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv-o winequality-white.csv
This command is the third way to do the same:
http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
- Now we begin our first steps of data exploration by seeing how the data in
winequality-white.csv
is formatted:
head -5 winequality-white.csv "fixed acidity...