Olympics Athletes analytics using the Spark Shell
Spark supports an interactive Scala-based shell, which can be used to process data as and when we receive actionable commands. In this recipe, we are going to analyze one sample dataset, which contains information about the athletes that have participated in the Olympics.
Getting ready
To perform this recipe, you should have Hadoop and Spark installed. You also need to install Scala. I am using Scala 2.11.0.
How to do it...
First of all, you need to download data from https://github.com/deshpandetanmay/hadoop-real-world-cookbook/blob/master/data/OlympicAthletes.csv, and store it in HDFS using the following commands:
$hadoop fs –mkdir /athletes $hadoop fs –put OlympicAthletes.csv /athletes
The following is some sample data from the file for your reference. The data comma-separated file contains various columns in a sequence such as the athlete name, country, year, gold, silver, bronze, and the total number of medals won by each athlete:
Yang Yilin...