Practice questions
Questions 1-8 are based on the data analytics pipeline in the AWS cloud shown in Figure 5.37. An engineer is designing a pipeline that will ingest long-term, big-volume streaming data from the web using Kinesis Data Streams, then make two copies: one copy pass to Kinesis Firehose and stored in an Amazon S3 bucket, the other data copy will be processed with Amazon EMR and then queried by Athena and visualized using Amazon QuickSight. Performance and costs are the main factors to be taken into account.
Figure 5.37 – Data analytics pipeline in the AWS cloud (redraw)
1. What instances would you recommend for the EMR cluster?
A. Reserved Instances for the cluster
B. Spot Instances for core and task nodes and a Reserved Instance for the master node
C. Spot Instances for the cluster
D. On-demand instances for the cluster
2. What filesystem would you recommend for the EMR cluster?
A. HDFS with a consistent view
B...