Test your knowledge
Before moving on to the next chapter, test your knowledge with the following questions:
- Assume on top of default EMR configurations, you need to install a few additional libraries and, post-installation, execute a few scripts. This process will be repeated every time a new instance is added to the cluster. How will you implement this while launching your cluster?
- You have a running EMR cluster, where you have one Hive and one Spark job configured to be executed in a sequence as EMR steps. You have noticed that step 2, which is a Spark job, is failing. With further analysis, you have identified that all tasks of that Spark job are completed but one task is running for a long period of time, which makes the whole process slow. How will you resolve this problem?
- Your organization has compliance policies that say all the application logs need to be persistent at least for a year. You are going to integrate EMR for one of your transient cluster use...