Integrating data with AWS Glue
AWS Glue was initially introduced as a serverless ETL service that allows users to crawl, catalog, transform, and ingest data into AWS for analytics. However, over the years, it has evolved into a fully-managed serverless data integration service.
AWS Glue simplifies the process of data integration, which, as discussed earlier, usually involves discovering, preparing, extracting, and combining data for analysis from different data stores. These tasks are often handled by multiple individuals/teams with a diverse set of skills in an organization.
As mentioned in the previous section, data integration is an iterative process that involves several steps. Let’s take a look at how AWS Glue can be used to perform some of these tasks.
Data discovery
AWS Glue Data Catalog can be used to discover and search data across all our datasets. Data Catalog enables us to store table metadata for our datasets and makes it easy to query these datasets...