Twitter trending topics using Spark streaming
In the previous recipe, we took a look at the SQL integrations of Spark. In this recipe, we are going to explore yet another powerful module called Spark Streaming. As the name suggests, Spark Streaming can listen to a stream of events and process data as and when it arrives.
Getting ready
To perform this recipe, you should have Hadoop and Spark installed. You also need to install Scala. I am using Scala 2.11.0. You should also have a Twitter account and some keys and tokens.
How to do it...
Spark streaming supports input from various sources such as Flume, HDFS, Kafka, Twitter, and so on. In this recipe, we are going to use Spark Streaming's Twitter source where we will be listening to streaming tweets and compute the top trending topics on Twitter.
To perform this recipe, we are going to write one Spark Streaming application in Scala.
In order to create the application, I am creating a folder called TwitterSpark
, and it will have the following...