Putting it all together
Now that we have learned about all the major components in AWS Glue let's look at how all the pieces fit together. The following diagram illustrates this:
In the preceding diagram, we see can the various steps that can take place when AWS Glue runs. The steps are explained in the following points:
- The first step is for the crawlers to scan these sources and extract metadata from them.
- This metadata can then be used to seed the AWS Glue Data Catalog.
- This metadata can be used by other AWS services, such as Amazon Athena, Redshift Spectrum, and Amazon EMR. These services can be used to write queries against the ingested data using the metadata from the AWS Glue Data Catalog to build these queries.
- Finally, the results of these queries can be used for visualizations in other AWS services, including Amazon QuickSight and Amazon SageMaker.
A wide variety of data sources can be ingested with AWS Glue...