Building data preparation pipelines
The deep neural networks are best suited for supervised learning problems where we have access to historical datasets. These datasets are used for training the neural network. As seen in diagram 5.1, the more data we have at our disposal for training, the better the deep neural network gets in terms of accurately predicting the outcome for the new and unknown data values by generalizing the training datasets. In order for the deep neural networks to perform optimally, we need to carefully procure, transform, scale, normalize, join, and split the data. This is very similar to building a data pipeline in a data warehouse or a data lake with the help of the ETL (Extract Transform and Load with a traditional data warehouse) and ELTTT (Extract Load and Transform multiple times in modern data lakes) pipelines.
We are going to deal with data from a variety of sources in structured and unstructured formats. In order to use the data in deep neural networks, we need...