We have already developed our understanding regarding Apache Flume in Chapter 3, Hadoop Ecosystem. If we recall, Apache Flume is a framework that helps move large amounts of streaming data from one place to another. It is primarily designed for log collection and aggregation from across different servers into a centralized place like Hadoop for processing and analysis. But its usage is not limited to just log aggregation. The data source connectors are customized and can be used to transport large amounts of event generated data, such as network traffic data, social media generated data, and so on; almost any type of data source.
Let's now move our discussion to the practical part, where we set up Flume to get data from any server and place it in the Hadoop directory structure.