Apache Flume offers the service to feed logs containing unstructured information back to Hadoop. Flume works across any type of data source. Flume can receive both log data or continuous event data, and it consumes events, incremental logs from sources such as the application server, and social media events.
The following diagram illustrates how Flume works. When flume receives an event, it is persisted in a channel (or data store), such as a local file system, before it is removed and pushed to the target by Sink. In the case of Flume, a target can be HDFS storage, Amazon S3, or another custom application:
Flume also supports multipleFlume agents, as shown in the preceding data flow. Data can be collected, aggregated together, and then processed through a multi-agent complex workflow that is completely customizable by the end user. Flume provides message reliability...