Introduction to AWS Glue
What is AWS Glue? AWS Glue is a fully managed service used to extract data from data sources, ingest the data into other AWS services such as Amazon S3, and transform this data to be used by consuming services or users. It is not meant to be used for small batches and files. Under the hood, AWS Glue uses Apache Spark, running it in a serverless environment. It can process terabyte- and petabyte-scale workloads, but it comes with some overhead, which is why it should only be used for big data projects.
Another important feature of AWS Glue is that it can handle disparate sources such as SQL and NoSQL databases – not just Amazon S3 files.
As we mentioned in the introduction, it is hard to overestimate the value of data in the current environment. Regardless of the industry, properly harnessing and leveraging data is critical to compete. AWS Glue and its underlying Apache Spark engine function as cornerstone technologies in many enterprises to process...