Using Google Cloud Platform for data processing
Google Cloud Platform offers Cloud Dataflow as a data processing service to serve both batch and real-time data streaming applications. This service is meant for data scientists and analytics application developers so that they can set up a processing pipeline for data analysis and data processing. Cloud Dataflow uses Apache Beam under the hood. Apache Beam originated from Google, but it is now an open source project under Apache. This project offers a programming model for building data processing using pipelines. Such pipelines can be created using Apache Beam and then executed using the Cloud Dataflow service.
The Google Cloud Dataflow service is similar to Amazon Kinesis
, Apache Storm, Apache Spark
, and Facebook Flux. Before we discuss how to use Google Dataflow with Python, we will introduce Apache Beam and its pipeline concepts.
Learning the fundamentals of Apache Beam
In the current era, data is like a cash cow...