Data ingestion using AWS Glue
In our data lake in Chapter 2, we introduced Glue Data Catalog, which is one of the key components of data lake design. Glue is also a popular ETL tool for data engineers, who want to ingest data from the source systems and transform the data as it flows between the different layers of the data lake. Glue provides complete flexibility to deal with any kind of data engineering complexity. In essence, Glue ETL can help extract data from any source system, transform it, and load it into any target system.
Since this chapter is all about batch data ingestion and we want to keep most of our focus on ingesting data into the data lake in S3, we will focus on those use cases. We have a dedicated chapter for data processing later, where we will revisit Glue ETL.
Use case for data ingestion using modern ETL techniques
The business at GreatFin wants to derive value from all the data available in its existing data stores; some are stored in older-generation...