Introducing Apache Flume
Flume, found at http://flume.apache.org, is another Apache project with tight Hadoop integration and we will explore it for the remainder of this chapter.
Before we explain what Flume can do, let's make it clear what it is not. Flume is described as a system for the retrieval and distribution of logs, meaning line-oriented textual data. It is not a generic data-distribution platform; in particular, don't look to use it for the retrieval or movement of binary data.
However, since the vast majority of the data processed in Hadoop matches this description, it is likely that Flume will meet many of your data retrieval needs.
Note
Flume is also not a generic data serialization framework like Avro that we used in Chapter 5, Advanced MapReduce Techniques, or similar technologies such as Thrift and Protocol Buffers . As we'll see, Flume makes assumptions about the data format and provides no ways of serializing data outside of these.
Flume provides mechanisms for retrieving...