Monitoring MapReduce
When it comes to monitoring the MapReduce status in the current Hadoop implementation, all required metrics can be obtained on the JobTracker level. There is no reason to monitor individual TaskTrackers, at least on an alert level. A periodic report on the number of alive and dead TaskTrackers should be sent out to monitor the overall framework health.
JobTracker checks
The following is the list of host-level resources to monitor on a JobTracker:
Check if the server is reachable using ping. Type: critical
Check disk space on logs and system volumes. JobTracker doesn't preserve state on a local filesystem, but not being able to write to the log files due to low disk space will cause issues. Type: critical
Check swap usage on the server. Type: critical
The following checks are specific to JobTracker process::
Monitor memory usage. You can monitor JobTracker memory usage by checking
HeapMemoryUsage.used
andHeapMemoryUsage.max
variables. Type: criticalChecking the
SummaryJson...