Monitoring Hadoop with Ganglia
While Nagios, or any other operational monitoring system will alert if things go wrong, it is also very useful to be able to graph various cluster metrics and explore trends. Ganglia is an open source package that was designed specifically to monitor large clusters. It provides access to the data via the web interface, can aggregate metrics across multiple machines, and so on.
To enable Hadoop metrics that are sent to Ganglia stats collection daemons, you need to add the following options in /etc/hadoop/conf/Hadoop-metrics2.properties
:
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 *.sink.ganglia.period=10 *.sink.ganglia.supportsparse=true *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm
Additionally, you will need to point all sinks to your Ganglia collector server:
namenode.sink.ganglia.servers=gangliahost:8649 datanode.sink.ganglia.servers=gangliahost...