Putting AWS analytics services together
In the previous Chapter 10, Bigdata and streaming data processing in AWS, you learned about AWS ETL services such as EMR and Glue. In this chapter, let's combine that with learning to build a data processing pipeline. The following diagram shows a data processing and analytics architecture in AWS by applying various analytics services to build an end-to-end solution.
As shown in the preceding diagram, data is ingested from various sources such as operational systems, marketing and other systems in S3. You want to ingest data fast without losing it, so these data are collected in the raw format first. You can clean, process and transform these data using an ETL platform such as EMR or Glue. Using Apache spark framework and writing data processing code from scratch is recommended to use Glue; otherwise, you can use EMR if you have a Hadoop skillset in your team. Transformed data stored...