HDFS sink
The job of the HDFS sink is to continuously open a file in HDFS, stream data into it, and at some point close that file and start a new one. As we discussed in Chapter 1, Overview and Architecture, how long between files rotations must be balanced with how quickly files are closed in HDFS, thus making the data visible for processing. As we've discussed, having lots of tiny files for input will make your MapReduce jobs inefficient.
To use the HDFS sink, set the type
parameter on your named sink
to hdfs
:
agent.sinks.k1.type=hdfs
This defines a HDFS sink named k1
for the agent named agent
. There are some additional required parameters you need to specify, starting with path
in HDFS where you want to write the data:
agent.sinks.k1.hdfs.path=/path/in/hdfs
This HDFS path, like most file paths in Hadoop, can be specified in three different ways, namely, absolute, absolute with server name, and relative. These are all equivalent (assuming your Flume Agent is run as the flume user):
Type |
Path... |
---|