Test your knowledge
Before moving on to the next chapter, test your knowledge with the following questions:
- You receive a daily incremental file from your source system at midnight and you are expected to process it and make it available for consumption. After that, during the day, at 3 P.M., you need to execute a machine learning job that will read this processed output. Will you use a persistent cluster or transient and how will you configure it?
- While creating an EMR cluster, you have a requirement to select multiple instance types for your node types and would like to take advantage of spot instances too. How would you configure your cluster?
- You have a manufacturing unit that expects all the Hadoop/Spark processing to happen near its on-premises site, but has plans to slowly migrate to the cloud. Which Amazon EMR deployment option is best suited?