Batch data analytics
Now let's start looking at the implementation of batch data analytics. Batch data analytics consists of two important elements:
Loading streams of sensor data from Kafka topics to HDFS.
Using Hive to perform analytics on inserted data.
Loading streams of sensor data from Kafka topics to HDFS
Let's assume that the sensors are enabled to write data to Kafka topics. Microcomputers such as the Raspberry Pi can be used to develop the interface between sensors and Kafka. In this section, we are going to see how we get the data from Kafka topics and write it to the HDFS folder.
To import the data from Kafka, first you need to have Kafka running on your machine. The following command starts Kafka and Zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties bin/kafka-server-start.sh config/server.properties
Next, we create a topic called sensor
, which we will be listening to:
bin/kafka-topics.sh --create --zookeeper <ip>:2181
--replication-factor 1 --partitions...