AWS Glue
AWS Glue is a fully managed, serverless data integration and ETL service. It can extract, manipulate, and transform data from a wide range of sources, allowing you to create accurate data models that can be imported into a database, loaded into an analytics platform, or used for machine learning models.
AWS Glue can be controlled using both the Console and CLI commands to allow you to configure automated data handling and data loading into your databases.
There are three components that AWS Glue uses:
- AWS Glue Data Catalog: This is a central repository that holds information about your data. It acts as an index to your schema and data stores, which helps control your ETL jobs.
- Job Scheduling System: This is a highly customizable scheduler. It can handle not only time-based scheduling but also contains options to allow it to watch for new files or new data to be processed, as well as event-driven scheduling.
- ETL Engine: AWS Glue's ETL engine is the...