Designing and building data pipelines
A data pipeline for big data systems must be able to integrate, consolidate, and transform many different types of data from various sources. Other useful supporting capabilities include data discovery, preparation, and management. Let's look at each of these.
Data integration
Google Cloud has a suite of services for data integration functions. Cloud Dataflow is a unified stream and batch processing service based on Apache Beam. It is a fully managed, serverless service offering with horizontal autoscaling. It allows you to create Apache Beam pipelines. These are data integration pipelines that offer functionalities to read, transform, and ingest data. If you're unfamiliar with Apache Beam pipelines and their use cases, one simple example would be a pipeline that writes different data subsets to different data stores based on a filter. Suppose, for instance, that you have a database of books identified by their titles. Your pipeline...