Hello Spark
In this section, we will create an Hello World
program for Spark and will then get some understanding of the internals of Spark. The Hello World
program in the big data world is also known as a Word Count
program. Given the text data as input, we will calculate the frequency of each word or number of occurrences of each word in the text data, that is, how many times each word has appeared in the text. Consider that we have the following text data:
Where there is a will there is a way
The number of occurrences of each word in this data is:
Word | Frequency |
Where | 1 |
There | 2 |
Is | 2 |
A | 2 |
Will | 1 |
Way | 1 |
Now we will solve this problem with Spark. So let's develop a Spark WordCount
application.
Prerequisites
The following are the prerequisites for preparing the Spark application:
- IDE for Java applications: We will be using the Eclipse IDE to develop the Spark application. Users can use any IDE as per their choice. Also, the IDE should have the Maven plugin installed (the latest version of Eclipse such as Mars...