Setting up infrastructure for data ingestion
There are multiple tools and frameworks available on the market for data ingestion. We will discuss the following in the scope of this book:
- Apache Kafka
- Apache NiFi
- Logstash
- Fluentd
- Apache Flume
Apache Kafka
Kafka is message broker which can be connected to any real-time framework available on the market. In this book, we will use Kafka often for all types of examples. We will use Kafka as a data source which keeps data from files in queues for further processing. Download Kafka from https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.1.1/kafka_2.11-0.10.1.1.tgz to your local machine. Once the kafka_2.11-0.10.1.1.tgz
file is downloaded, extract the files using the following command:
cp kafka_2.11-0.10.1.1.tgz /home/ubuntu/demo/kafkacd /home/ubuntu/demo/kafkatar -xvf kafka_2.11-0.10.1.1.tgz
The following files and folders are extracted as seen in the following screenshot:
Note
Change the listener's property in the server.properties
file. It should be...