Monitoring your EMR cluster
When you think about monitoring your Amazon EMR cluster, you can consider the following options:
- Using the EMR console to get the overall cluster status, the health of nodes, and the high-level status of YARN or Hadoop Spark applications
- Analyzing logs generated by EMR and your big data applications, which might be stored in the master node or core task nodes
- Accessing web interfaces of different Hadoop applications to analyze the job status or task execution or Ganglia to monitor the overall performance of your cluster
- Using Amazon CloudWatch for logging, monitoring, and integrating rule-based notifications
- Using Amazon CloudTrail to audit the access logs for your EMR cluster APIs
We covered the first two options in the previous chapter, where we explained how you can use the EMR console to monitor cluster status and how you can access logs available in the master node with the log archive to Amazon S3.
Now, let's...