Accessing and loading datasets
In this section, we will review some publicly available datasets and cover methods of loading some of these datasets into Spark. Then, we will review several methods of exploring and visualizing these datasets on Spark.
After this section, we will be able to find some datasets to use, load them into Spark, and then start to explore and visualize this data.
Accessing publicly available datasets
As there is an open source movement to make software free, there is also a very active open data movement that made a lot of datasets freely accessible to every researcher and analyst. At a worldwide scale, most governments make their collected datasets open to the public. For example, on http://www.data.gov/, there are more than 140,000 datasets available to be used freely, which are spread over agriculture, finance, and education.
Besides open data coming from various governmental organizations, many research institutions also collect a lot of very useful datasets and...