Consuming streaming data using Glue
Now that we understand how Glue works in batch mode, let’s understand the process of updating the data coming through a stream.
The CloudFormation stack creates a Managed Streaming for Apache Kafka (MSK) cluster for this purpose. You will have to create a Glue connection for this MSK cluster. It is important that you name this connection as chapter-data-analysis-msk-connection
. This connection is used in the jobs that follow. These jobs get the Kafka broker details from the connection.
Creating chapter-data-analysis-msk-connection
We will execute Glue jobs to load data into an MSK topic and also consume data from the topic. Both of these jobs require broker information and other details about the MSK cluster. Now we will create an MSK connection in Glue. Please ensure that you put the name of the connection as chapter-data-analysis-msk-connection
. This is because the Glue jobs have been preconfigured to use this name as the connection...