Data Processing
In this chapter, we will look at the following key topics:
- Challenges with data processing platforms
- Data processing using Amazon EMR
- Data processing using AWS Glue
- Data processing using AWS Glue DataBrew
Let’s quickly recap what we have covered so far in this book. We set the foundation by creating the layers of a data lake on Amazon S3. The layers represent distinct storage areas where all the data can exist in a centralized location. The next piece of the puzzle we solved was to get data from disparate sources into the raw layer of the data lake in S3. Then, we spent the whole of Chapter 3 looking at batch data ingestion mechanisms, followed by Chapter 4, where we discussed streaming data ingestion mechanisms.
So, till this point, all the data is in the raw layer of S3; of course, it can also go directly to the conformed layer, if you have processed and optimized the data on the fly during the ingestion process. If you recall...