Understanding EMR Cluster and Nodes
The central component of Amazon EMR is the cluster. A cluster is a collection of Amazon EC2 instances where each instance in the cluster is called a node. Each node has a role within the cluster, referred to as the node type. Amazon EMR also installs different software components on each node type, and each node has a different role; as you can see in the following diagrams, node groups are the Leader node, core node and task node.
![Figure 10.1 – Amazon EMR node type](https://static.packt-cdn.com/products/9781803238951/graphics/media/file70.png)
As shown in the preceding diagram, the following is the role of each node type :
Leader node:
- Manages the cluster by running software components to coordinate data distribution and tasks among other nodes for processing.
- Tracks the status of tasks and monitors the health of the cluster. Every cluster has a leader node, and creating a single-node cluster with only the leader node is possible.
Core node:
- The leader node manages core nodes.
- Run tasks and store data in your cluster...