In this section, we will create an Hello World program for Spark and will then get some understanding of the internals of Spark. The Hello World program in the big data world is also known as a Word Count program. Given the text data as input, we will calculate the frequency of each word or number of occurrences of each word in the text data, that is, how many times each word has appeared in the text. Consider that we have the following text data:
Where there is a will there is a way
The number of occurrences of each word in this data is:
Word |
Frequency |
Where |
1 |
There |
2 |
Is |
2 |
A |
2 |
Will |
1 |
Way |
1 |
Now we will solve this problem with Spark. So let's develop a Spark WordCount application.