ML data storage and processing
As we discussed in Chapter 4, Developing and Deploying ML Models, storing data involves collecting raw data from various data sources and storing it in a centralized repository. On the other hand, data processing includes both data engineering and feature engineering. Data engineering is the process of converting raw data (the data in its source form) into prepared data (the dataset in the form that is ready to be input into ML tasks). Feature engineering then tunes the prepared data to create the features expected by the ML model.
For structured data, we recommend using Google Cloud BQ to store and process it. For unstructured data, videos, audio, and image data, we recommend using Google Cloud object storage to store them and Google Cloud Dataflow or Dataproc to process them. As we have discussed, Dataflow is a managed service that uses the Apache Beam programming model to convert unstructured data into binary formats and can improve data ingestion...