Using Kafka Connect
As mentioned, Kafka Connect is a framework used to connect Kafka with external systems such as key-value stores (think of Riak, Coherence, and Dynamo), databases (Cassandra), search indexes (Elastic), and filesystems (HDFS).
In this book, there is a whole chapter about Kafka connectors, but this recipe is part of the Confluent Platform.
Getting ready
The Confluent Platform should be up and running:
$ confluent log connect
How to do it...
To read a data file with Kafka Connect:
- To list the installed connectors:
$ confluent list connectors
Bundled Predefined Connectors (edit configuration under etc/):
elasticsearch-sink
file-source
file-sink
jdbc-source
jdbc-sink
hdfs-sink
s3-sink
- The configuration file is located at
./etc/kafka/connect-file-source.properties
. It has these values:- The instance name:
name=file_source
- The implementer class:
connector.class=FileStreamSource
- The number of tasks of this connector instance:
tasks.max=1
- The input file:
file=continuous...