Accessing the data sources
TensorFlow Enterprise can easily access data sources in Google Cloud Storage as well as BigQuery. Either of these data sources can easily host gigabytes to terabytes of data. Reading training data into the JupyterLab runtime at this magnitude of size is definitely out of question, however. Therefore, streaming data as batches through training is the way to handle data ingestion. The tf.data
API is the way to build a data ingestion pipeline that aggregates data from files in a distributed system. After this step, the data object can go through transformation steps and evolve into a new data object for training.
In this section, we are going to learn basic coding patterns for the following tasks:
- Reading data from a Cloud Storage bucket
- Reading data from a BigQuery table
- Writing data into a Cloud Storage bucket
- Writing data into BigQuery table
After this, you will have a good grasp of reading and writing data to a Google Cloud...