Importing data from Kafka into HDFS using Flume
Kafka is one the most popular message queue systems being used these days. We can listen to Kafka topics and put the message data directly into HDFS using Flume. The latest Flume version supports importing data from Kafka easily. In this recipe, we are going to learn how to import Kafka messages to HDFS.
Getting ready
To perform this recipe, you should have a Hadoop cluster running with you as well as the latest version of Flume installed on it. Here I am using Flume 1.6. We also need Kafka installed and running on one of the machines. I am using kafka_2.10-0.9.0.0.
How to do it...
To import the data from Kafka, first you need to have Kafka running on your machine. The following command starts Kafka and Zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties bin/kafka-server-start.sh config/server.properties
Next I create a topic called
weblogs
which we will be listening to:bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication...