Planning your next steps
Before we conclude by summarizing the chapter, there are a few things I highly recommend that you try out with Amazon EMR, as well as with Amazon Redshift. First up, EMRFS.
We briefly touched upon the topic of EMRFS while deciding which filesystem to opt for when it comes to deploying the EMR Cluster. EMR File System (EMRFS) is an implementation of the traditional HDFS that allows for reading and writing files from Amazon EMR directly to Amazon S3. This essentially allows you to leverage the consistency provided by S3, as well as some of its other feature sets, such as data encryption. To read more about EMRFS and how you can use it for your EMR clusters, visit: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html.
Secondly, Amazon EMR also provides an enterprise-grade Hadoop distribution in the form of MapR. The MapR distribution of Hadoop provides you with a plethora of features that enhances your overall experience when it comes to building distributed...