Do It Yourself
We will build a use case using filters, group by, and aggregators. The use case finds the top 10 devices that generate the maximum data in a batch. Here is the pseudo code:
- Write a data generator that will publish an event with fields such as phone number, bytes in and bytes out
- The data generator will publish events in Kafka
- Write a topology program:
- To get the events from Kafka
- Apply filter to exclude phone number to take part in top 10
- Split event on the basis of comma
- Perform group by operation to bring same phone numbers together
- Perform aggregate and sum out bytes in and bytes out together
- Now, apply assembly with the
FirstN
function which requires the field name and number elements to be calculated - And finally display it on the console
You will find the code in the code bundle for reference.
Program:
package com.book.chapter8.diy;
In the following code snippet, we have import files:
import org.apache.storm.Config; import org.apache.storm.LocalCluster; import org.apache.storm.kafka...