Finding the top n ranked topics using Redis
The topology will compute a rolling ranking of the most popular words in the past 5 minutes. The word counts are stored in individual windows of 60 seconds in length. It consists of the following components:
- Twitter stream spout (
twitterstream.py
): This reads tweets from the Twitter sample stream. This spout is unchanged from Chapter 4, Example Topology – Twitter. - Splitter bolt (
splitsentence.py
): This receives tweets and splits them into words. This is also identical to the one in Chapter 4, Example Topology – Twitter. - Rolling word count bolt (
rollingcount.py
): This receives words and counts the occurrences. The Redis keys look liketwitter_word_count:<start time of current window in seconds>
, and the values are stored in a hash using the following simple format:{ "word1": 5, "word2", 3, }
This bolt uses the Redis
expireat
command to discard old data after 5 minutes. These lines of code perform the...