Introduction to CDC and Datastream
Now that we’ve learned about Pub/Sub Dataflow streaming, let’s get a better idea of how the data starts being pushed from the source system to BigQuery. Unfortunately, in the real world, there are many cases in which you can’t change the source system code at all. This means that you can’t add a Pub/Sub publisher to publish the records for streaming.
This may happen for many reasons – for example, in an organization such as banking. The core application is usually a monolith product that is developed by third-party vendors. Even if it’s developed internally, the complexity of the banking core system makes it difficult to change the code to add a Pub/Sub publisher in every data point. How can we solve this?
Back to our learning batch pipeline, we must extract data from tables in databases. We export the database’s table into files and load it to BigQuery. Can we do the same for streaming?
Unfortunately...