Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Apache Flume: Distributed Log Collection for Hadoop

You're reading from  Apache Flume: Distributed Log Collection for Hadoop

Product type Book
Published in Jul 2013
Publisher Packt
ISBN-13 9781782167914
Pages 108 pages
Edition 1st Edition
Languages
Toc

Index

A

  • agent
    • about / Sources, channels, and sinks
  • agent.channels.access / Flume configuration file overview
  • agent.channels property / Flume configuration file overview
  • agent command / Starting up with "Hello World"
  • agent process
    • monitoring / Monitoring the agent process
  • Apache Avro serializer
    • about / Apache Avro
  • Apache Bigtop project
    • URL / Flume in Hadoop distributions
  • avro-client parameter / Command-line Avro
  • Avro Source
    • about / Avro Source/Sink
    • using / Avro Source/Sink
    • command-line / Command-line Avro
  • avro_event serializer / Event serializers

B

  • batchSize property / The exec source, The spooling directory source, The multiport syslog TCP source
  • best effort (BE)
    • about / Flume 0.9
  • bufferMaxLines property / The spooling directory source
  • byteCapacity, configuration parameter / Memory channel
  • byteCapacityBufferPercentage, configuration parameter / Memory channel

C

  • -c parameter / Starting up with "Hello World"
  • capacity, configuration parameter / Memory channel, File channel
  • channel
    • about / Sources, channels, and sinks
  • channel parameter / HDFS sink
  • channel selector
    • about / Channel selectors
    • replicating channel selector / Replicating
    • multiplexing channel selector / Multiplexing
  • channels parameter / The syslog TCP source
  • channels property / The exec source, The spooling directory source, The syslog UDP source, The multiport syslog TCP source
  • charset.default property / The multiport syslog TCP source
  • charset.port.PORT# property / The multiport syslog TCP source
  • checkpointDir, configuration parameter / File channel
  • checkpointInterval, configuration parameter / File channel
  • Cloudera
    • URL / Flume in Hadoop distributions
  • codecs
    • about / Compression codecs
  • command property / The exec source
  • CompressedStream file type / Compressed stream

D

  • --dirname option / Command-line Avro
  • -Dflume.root.logger property / Starting up with "Hello World"
  • daa flows
    • tiering / Tiering data flows
  • data
    • routing / Routing
  • dataDir path / File channel
  • dataDirs, configuration parameter / File channel
  • disk failover (DFO)
    • about / Flume 0.9

E

  • Elastic Search
    • about / Tiered data collection (multiple flows and/or agents)
  • end-to-end (E2E)
    • about / Flume 0.9
  • event / Flume events, Interceptors
  • Event serializer
    • about / Event serializers
    • Text output / Text output
    • Text with headers / Text with headers
    • Apache Avro / Apache Avro
    • File type / File type
    • timeouts and workers / Timeouts and workers
  • eventSize parameter / The syslog TCP source
  • eventSize property / The multiport syslog TCP source
  • excludeEvents property / Regular expression filtering
  • exec source
    • about / The exec source
    • type property / The exec source
    • channels property / The exec source
    • command property / The exec source
    • restart property / The exec source
    • restartThrottle property / The exec source
    • logStdErr property / The exec source
    • batchSize property / The exec source

F

  • Facility, header key / The syslog UDP source, The multiport syslog TCP source
  • facility, header key / The syslog TCP source
  • failover
    • about / Failover
  • File Channel
    • about / File channel
    • using / File channel
    • configuration parameters / File channel
    • checkpointDir, configuration parameter / File channel
    • dataDirs, configuration parameter / File channel
    • capacity, configuration parameter / File channel
    • keep-alive, configuration parameter / File channel
    • transactionCapacity, configuration parameter / File channel
    • checkpointInterval, configuration parameter / File channel
    • write-timeout, configuration parameter / File channel
    • maxFileSize, configuration parameter / File channel
    • minimumRequiredSpace, configuration parameter / File channel
  • fileHeaderKey property / The spooling directory source
  • fileHeader property / The spooling directory source
  • fileSuffix property / The spooling directory source
  • File Type
    • about / File type
    • SequenceFile file type / Sequence file
    • Data stream file type / Data stream
    • Compressed stream file type / Compressed stream
  • Flume
    • events / Flume events
    • URL / Downloading Flume
    • downloading / Downloading Flume
    • in Hadoop distributions / Flume in Hadoop distributions
    • configuration file, overview / Flume configuration file overview
    • event / Interceptors
  • Flume-NG
    • about / Flume 1.X (Flume-NG)
  • flume.client.log4j.log.level / Log4J Appender
  • flume.client.log4j.logger.name / Log4J Appender
  • flume.client.log4j.logger.other / Log4J Appender
  • flume.client.log4j.message.encoding / Log4J Appender
  • flume.client.log4j.timestamp / Log4J Appender
  • flume.monitoring.hosts property / Ganglia
  • flume.monitoring.isGanglia3 property / Ganglia
  • flume.monitoring.pollInterval property / Ganglia
  • flume.monitoring.port type property / The internal HTTP server
  • flume.monitoring.type property / Ganglia, The internal HTTP server, Custom monitoring hooks
  • flume.syslog.status, header key / The syslog UDP source, The syslog TCP source, The multiport syslog TCP source
  • Flume 0.9
    • about / Flume 0.9
  • Flume 1.X
    • about / Flume 1.X (Flume-NG)
  • Flume JVM
    • URL / Monitoring performance metrics

G

  • Ganglia
    • about / Ganglia
    • URL / Ganglia

H

  • --headerFile option / Command-line Avro
  • Hadoop distributions
    • Flume / Flume in Hadoop distributions
  • HDFS
    • issues / The problem with HDFS and streaming data/logs
    • about / Tiered data collection (multiple flows and/or agents)
  • hdfs.batchSize parameter / HDFS sink
  • hdfs.callTimeout property / Timeouts and workers
  • hdfs.codeC parameter / HDFS sink
  • hdfs.filePrefix parameter / HDFS sink
  • hdfs.fileSuffix parameter / HDFS sink
  • hdfs.fileSuffix property
    • about / Path and filename
    / Compression codecs
  • hdfs.fileType property / Sequence file, Data stream
  • hdfs.idleTimeout property / Timeouts and workers
  • hdfs.inUsePrefix parameter / HDFS sink
  • hdfs.inUseSuffix parameter / HDFS sink
  • hdfs.maxOpenFiles parameter / HDFS sink
  • hdfs.path parameter / HDFS sink
  • hdfs.rollCount parameter / HDFS sink
  • hdfs.rollInterval parameter / HDFS sink
  • hdfs.rollSize parameter / HDFS sink
  • hdfs.rollSize rotation / File rotation
  • hdfs.rollTimerPoolSize property / Timeouts and workers
  • hdfs.round parameter / HDFS sink
  • hdfs.roundUnit parameter / HDFS sink
  • hdfs.roundValue parameter / HDFS sink
  • hdfs.threadsPoolSize property / Timeouts and workers
  • hdfs.timeZone parameter / HDFS sink
  • hdfs.timeZone property / Path and filename
  • hdfs.writeType property / Sequence file
  • HDFS Sink
    • about / HDFS sink
    • using / HDFS sink
    • absolute / HDFS sink
    • absolute with server name / HDFS sink
    • relative / HDFS sink
    • type parameter / HDFS sink
    • channel parameter / HDFS sink
    • hdfs.path parameter / HDFS sink
    • hdfs.filePrefix parameter / HDFS sink
    • hdfs.fileSuffix parameter / HDFS sink
    • hdfs.maxOpenFiles parameter / HDFS sink
    • hdfs.round parameter / HDFS sink
    • hdfs.roundValue parameter / HDFS sink
    • hdfs.roundUnit parameter / HDFS sink
    • hdfs.timeZone parameter / HDFS sink
    • hdfs.inUsePrefix parameter / HDFS sink
    • hdfs.inUseSuffix parameter / HDFS sink
    • hdfs.rollInterval parameter / HDFS sink
    • hdfs.rollSize parameter / HDFS sink
    • hdfs.rollCount parameter / HDFS sink
    • hdfs.batchSize parameter / HDFS sink
    • hdfs.codeC parameter / HDFS sink
    • path and filename / Path and filename
    • file rotation / File rotation
  • Hello World example
    • about / Starting up with "Hello World"
  • help command / Starting up with "Hello World"
  • Hortonworks
    • URL / Flume in Hadoop distributions
  • hostHeader property / Host
  • Host interceptor
    • about / Host
    • type property / Host
    • hostHeader property / Host
    • preserveExisting property / Host
    • useIP property / Host
  • hostname, header key / The syslog UDP source, The syslog TCP source, The multiport syslog TCP source
  • host parameter / The syslog TCP source
  • host property / The syslog UDP source, The multiport syslog TCP source
  • HTTP server
    • about / The internal HTTP server
    • flume.monitoring.type property / The internal HTTP server
    • flume.monitoring.port type property / The internal HTTP server

I

  • interceptor / Interceptors, channel selectors, and sink processors
  • interceptors
    • about / Interceptors
    • Timestamp interceptor / Timestamp
    • Host interceptor / Host
    • Static interceptor / Static
    • regular expression filtering interceptor / Regular expression filtering
    • regular expression extractor interceptor / Regular expression extractor
    • custom interceptors / Custom interceptors
  • interceptors property / Interceptors

K

  • keep-alive, configuration parameter / Memory channel, File channel
  • keep-alive parameter / Memory channel
  • key property / Static

L

  • load balancing
    • about / Load balancing
  • LoadBalancingLog4jAppender class / The Load Balancing Log4J Appender
  • Log4J Appender
    • about / Log4J Appender
    • properties / Log4J Appender
    • Flume headers / Log4J Appender
    • flume.client.log4j.logger.name / Log4J Appender
    • flume.client.log4j.log.level / Log4J Appender
    • flume.client.log4j.timestamp / Log4J Appender
    • flume.client.log4j.message.encoding / Log4J Appender
    • flume.client.log4j.logger.other / Log4J Appender
    • load balancing / The Load Balancing Log4J Appender
  • logStdErr property / The exec source
  • log time
    • versus transport time / Transport time versus log time

M

  • MapR
    • URL / Flume in Hadoop distributions
  • MaxBackoff property / The Load Balancing Log4J Appender
  • maxBufferLineLength property / The spooling directory source
  • maxFileSize, configuration parameter / File channel
  • Memory Channel
    • about / Memory channel
    • configuration parameters / Memory channel
    • capacity, configuration parameter / Memory channel
    • type, configuration parameter / Memory channel
    • transactionCapacity, configuration parameter / Memory channel
    • byteCapacityBufferPercentage, configuration parameter / Memory channel
    • byteCapacity, configuration parameter / Memory channel
    • keep-alive, configuration parameter / Memory channel
  • memory channel / Flume configuration file overview
  • minimumRequiredSpace, configuration parameter / File channel
  • Monit
    • about / Monit
    • URL / Monit
  • multiple data centers
    • considerations / Considerations for multiple data centers
  • multiplexing channel selector
    • about / Multiplexing
  • multiport syslog TCP source
    • about / The multiport syslog TCP source
    • type property / The multiport syslog TCP source
    • channels property / The multiport syslog TCP source
    • ports property / The multiport syslog TCP source
    • eventSize property / The multiport syslog TCP source
    • portHeader property / The multiport syslog TCP source
    • batchSize property / The multiport syslog TCP source
    • readBufferSize property / The multiport syslog TCP source
    • numProcessors property / The multiport syslog TCP source
    • charset.default property / The multiport syslog TCP source
    • charset.port.PORT# property / The multiport syslog TCP source
    • Facility, header key / The multiport syslog TCP source
    • priority, header key / The multiport syslog TCP source
    • timestamp, header key / The multiport syslog TCP source
    • hostname, header key / The multiport syslog TCP source
    • flume.syslog.status, header key / The multiport syslog TCP source

N

  • Nagios
    • about / Nagios
    • URL / Nagios
  • Nagios JMX
    • URL / Monitoring performance metrics
    • flume.monitoring.type property / Ganglia
    • flume.monitoring.hosts property / Ganglia
    • flume.monitoring.pollInterval property / Ganglia
    • flume.monitoring.isGanglia3 property / Ganglia
  • name / Flume configuration file overview
  • netcat / Starting up with "Hello World"
  • numProcessors property / The multiport syslog TCP source

O

  • org.apache.flume.interceptor.Interceptor interface / Custom interceptors

P

  • PCI
    • about / Compliance and data expiry
  • PII
    • about / Compliance and data expiry
  • portHeader property / The multiport syslog TCP source
  • port parameter / The syslog TCP source
  • port property / The syslog UDP source
  • ports property / The multiport syslog TCP source
  • POSIX
    • about / The problem with HDFS and streaming data/logs
  • preserveExisting property / Timestamp, Host, Static
  • priority, header key / The syslog UDP source, The syslog TCP source, The multiport syslog TCP source
  • processor.backoff property / Load balancing
  • processor.maxPenality / Failover
  • processor.maxpenality property / Failover
  • processor.priority.NAME property / Failover
  • processor.priority property / Failover
  • processor.selector property / Load balancing
  • processor.type property / Load balancing, Failover

R

  • readBufferSize property / The multiport syslog TCP source
  • Red Hat Enterprise Linux (RHEL) / Flume in Hadoop distributions
  • regex property / Regular expression filtering, Regular expression extractor
  • regular expression
    • filtering / Regular expression filtering
  • regular expression extractor interceptor
    • about / Regular expression extractor
    • properties / Regular expression extractor
    • type property / Regular expression extractor
    • regex property / Regular expression extractor
    • serializers property / Regular expression extractor
    • serializers.NAME.name property / Regular expression extractor
    • serializers.NAME.type property / Regular expression extractor
    • serializers.NAME.PROP property / Regular expression extractor
  • regular expression filtering interceptor
    • about / Regular expression filtering
    • properties / Regular expression filtering
    • type property / Regular expression filtering
    • regex property / Regular expression filtering
    • excludeEvents property / Regular expression filtering
  • relayHost header / Host
  • replicating channel selector
    • about / Replicating
  • restart property / The exec source
  • restartThrottle property / The exec source
  • RFC 3164
    • URL / Syslog sources
  • RFC 5424
    • URL / Syslog sources
  • routing
    • about / Routing
  • rsyslog
    • URL / Syslog sources

S

  • selector.header property / Multiplexing
  • selector.type property / Channel selectors
  • serializer.appendNewLine property / Text output
  • serializer.compressionCodec property / Apache Avro
  • serializer.syncIntervalBytes property / Apache Avro
  • serializer property / Text output
  • serializers
    • about / Regular expression extractor
  • serializers.NAME.name property / Regular expression extractor
  • serializers.NAME.PROP property / Regular expression extractor
  • serializers.NAME.type property / Regular expression extractor
  • serializers property / Regular expression extractor
  • Sink
    • about / Sources, channels, and sinks
  • Sink groups
    • about / Sink groups
    • load balancing / Load balancing
    • failover / Failover
  • source
    • about / Sources, channels, and sinks
  • SOX
    • about / Compliance and data expiry
  • spoolDir property / The spooling directory source
  • spooling directory source
    • about / The spooling directory source
    • type property / The spooling directory source
    • channels property / The spooling directory source
    • spoolDir property / The spooling directory source
    • fileSuffix property / The spooling directory source
    • fileHeader property / The spooling directory source
    • fileHeaderKey property / The spooling directory source
    • batchSize property / The spooling directory source
    • bufferMaxLines property / The spooling directory source
    • maxBufferLineLength property / The spooling directory source
  • start() method / Custom monitoring hooks
  • Static interceptor
    • about / Static
    • type property / Static
    • key property / Static
    • value property / Static
    • preserveExisting property / Static
    • properties / Static
  • stop() method / Custom monitoring hooks
  • syslog sources
    • about / Syslog sources
    • syslog UDP source / The syslog UDP source
    • syslog TCP source / The syslog TCP source
    • multiport syslog TCP source / The multiport syslog TCP source
  • syslog TCP source
    • about / The syslog TCP source
    • type parameter / The syslog TCP source
    • channels parameter / The syslog TCP source
    • port parameter / The syslog TCP source
    • host parameter / The syslog TCP source
    • eventSize parameter / The syslog TCP source
    • facility, header key / The syslog TCP source
    • priority, header key / The syslog TCP source
    • timestamp, header key / The syslog TCP source
    • hostname, header key / The syslog TCP source
    • flume.syslog.status, header key / The syslog TCP source
  • syslog UDP source
    • about / The syslog UDP source
    • type property / The syslog UDP source
    • channels property / The syslog UDP source
    • port property / The syslog UDP source
    • host property / The syslog UDP source
    • Facility, header key / The syslog UDP source
    • priority, header key / The syslog UDP source
    • timestamp, header key / The syslog UDP source
    • hostname, header key / The syslog UDP source
    • flume.syslog.status, header key / The syslog UDP source

T

  • tail
    • about / The problem with using tail
  • tail -F command / The exec source
  • TailSource
    • about / The problem with using tail
  • Text output serializer / Text output
  • text_with_headers serializer / Text with headers
  • timestamp, header key / The syslog UDP source, The syslog TCP source, The multiport syslog TCP source
  • Timestamp interceptor
    • properties / Timestamp
    • type property / Timestamp
    • preserveExisting property / Timestamp
  • time zones
    • about / Time zones are evil
  • transactionCapacity, configuration parameter / Memory channel, File channel
  • transport time
    • versus log time / Transport time versus log time
  • type, configuration parameter / Memory channel
  • type parameter / HDFS sink, The syslog TCP source
  • type property / The exec source, The spooling directory source, The syslog UDP source, The multiport syslog TCP source, Host, Static, Regular expression filtering, Regular expression extractor

U

  • useIP property / Host

V

  • value property / Static
  • version command / Starting up with "Hello World"

W

  • write-timeout, configuration parameter / File channel
  • Write Ahead Log (WAL) / File channel
lock icon The rest of the chapter is locked
arrow left Previous Section
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}