Why Flume?
This section is dedicated explain you why we have chosen Flume as our technical choice in the technical capability that we look to realize Data Acquisition layer for handling stream/real time data.
With the following subsections, we will first dive into the history and then into Flume’s advantages as well as disadvantages. The advantages detailed are the main reasons for our choice of this technology for dealing with transfer of real-time data into Hadoop.
History of Flume
Apache Flume was developed by Cloudera for handling and moving large amount data produced into Hadoop. Without minimum or no delay (NRT: Near Real Time or Real time) the company wanted the data produced to be moved to Hadoop system, for various analysis to be carried. That was how this beautiful came into existence.
As detailed in previous section, it was initially conceived and developed to take care of a particular use case of collecting and aggregating log data from various source (web servers) into Hadoop for...