Data Fusion
Cloud Data Fusion is a fully managed and cloud-native data integration service for quickly building and managing data pipelines. Data Fusion uses Dataproc as the execution environment for these pipelines. The GUI caters to a variety of users, which means that business users, developers, and data scientists can easily build integration solutions that will cleanse, prepare, and transform data. Data fusion also offers a library of preconfigured plugins to extend its capabilities. It is also important to note that it is powered by an open source project called Cask Data Application Platform (CDAP).
Core concepts
Some core Data Fusion concepts are worth highlighting to understand the product and how to use it.
Instances
To begin with, we must create a Cloud Data Fusion instance. Instances run as the Compute Engine service account and Data Fusion executes pipelines using a Dataproc cluster. Instances are a unique deployment of Data Fusion and are created from the...