Monitoring AWS Glue jobs with CloudWatch alarms
AWS Glue is a serverless service used to perform ETL operations in AWS. Glue connects to multiple data sources to retrieve and transform data into different structures and formats. AWS Glue is easy to use and makes it faster to perform a transformation using Python or Scala, which are the two main options within Glue. Glue also has a very deep integration with Athena. The tables in Athena can be read within Glue and used for further ETL processes.
We shall do a simple example of transforming the data we queried previously in Athena: using a CloudTrail trail. We will create a Glue job that transforms the original data, compresses it to a different data format (Parquet), and writes it to another S3 bucket. Then, we can see the logs and activities and the different metrics used to measure the success or failure of a Glue job. These are the steps to create a Glue job:
- Navigate to Services | Glue in the AWS management console. ...