Technical requirements
To get started with this chapter, you will need a workstation that’s running Linux, macOS, or Windows with at least 7 GB of storage and 4 GB of RAM. While the code snippets can be run directly on AWS Glue (an AWS account is required to access AWS Glue), you can still run most of the code snippets in this chapter on your workstation directly. The code snippets in this chapter are available in this book’s GitHub repository at https://github.com/PacktPublishing/Serverless-ETL-and-Analytics-with-AWS-Glue/tree/main/Chapter03.
There are several options available for setting up the Glue development environment on your workstation. Please refer to the AWS Glue documentation at https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html for instructions regarding each of those options.
Now, let’s explore how we can ingest data from different types of data stores one by one.