Understanding the cluster life cycle
When you launch an EMR cluster through the AWS SDK, command-line interface (CLI), or console, it follows a series of steps to launch required infrastructure resources, configure them with required libraries, and then execute any bootstrap actions defined.
The following is the sequence of steps the cluster follows to complete the setup successfully:
- Provision the EC2 instances for the cluster to represent master, core, and task nodes using the default AMI or the custom API you have specified. At this phase, the cluster shows the status as STARTING.
- Run the bootstrap actions that you have specified to install custom third-party libraries or do additional configurations on instances, or start any specific services. At this phase, the cluster shows the status as BOOTSTRAPPING.
- Install libraries related to the Hadoop services (Hive, Pig, Hue, Spark, HBase, Tez, and so on) you have selected during the cluster launch. After completion...