Time for action – writing network traffic onto HDFS
This discussion of Flume in a book about Hadoop hasn't actually used Hadoop at all so far. Let's remedy that by writing data onto HDFS via Flume.
Create the following file as
agent4.conf
within the Flume working directory:agent4.sources = netsource agent4.sinks = hdfssink agent4.channels = memorychannel agent4.sources.netsource.type = netcat agent4.sources.netsource.bind = localhost agent4.sources.netsource.port = 3000 agent4.sinks.hdfssink.type = hdfs agent4.sinks.hdfssink.hdfs.path = /flume agent4.sinks.hdfssink.hdfs.filePrefix = log agent4.sinks.hdfssink.hdfs.rollInterval = 0 agent4.sinks.hdfssink.hdfs.rollCount = 3 agent4.sinks.hdfssink.hdfs.fileType = DataStream agent4.channels.memorychannel.type = memory agent4.channels.memorychannel.capacity = 1000 agent4.channels.memorychannel.transactionCapacity = 100 agent4.sources.netsource.channels = memorychannel agent4.sinks.hdfssink.channel = memorychannel
Start the agent:
$ flume-ng agent...