Use-case example – Bikeshare station data pipeline with KDF
The use case is to take streaming data coming from bike stations spread across multiple New York locations, decorate it with address information stored in an Amazon DynamoDB table, buffer and aggregate the data, and land it in an S3 bucket that forms a data lake for historical analysis and insights utilizing big-data query tools such as Apache Hive and Amazon Athena, which is a serverless Apache Presto service.
The architecture for delivering data to S3 in Parquet format using KDF for analysis with big-data tools is depicted in the following diagram:
This architecture is a part of the architecture described in Chapter 4, Kinesis Data Streams, under the Data Pipelines with Amazon Kinesis Data Streams section, and is a blow-up of the Amazon KDF part of the architecture.
The code and configuration files referenced...