Understanding EMR Cluster and Nodes
The central component of Amazon EMR is the cluster. A cluster is a collection of Amazon EC2 instances where each instance in the cluster is called a node. Each node has a role within the cluster, referred to as the node type. Amazon EMR also installs different software components on each node type, and each node has a different role; as you can see in the following diagrams, node groups are the Leader node, core node and task node.
As shown in the preceding diagram, the following is the role of each node type :
Leader node:
- Manages the cluster by running software components to coordinate data distribution and tasks among other nodes for processing.
- Tracks the status of tasks and monitors the health of the cluster. Every cluster has a leader node, and creating a single-node cluster with only the leader node is possible.
Core node:
- The leader node manages core nodes.
- Run tasks and store data in your cluster...