Design and experimentation
After the theoretical foundation of the problem statement is built, we move to the design and/or experimentation phase where we build the POC by trying out several model implementations. The crucial part of design and experimentation lies in the dataset and the preprocessing of the dataset. For any data science project, the major time share is spent on data cleaning and preprocessing. Deep learning is no different from this.
Data preprocessing is one of the vital parts of building a deep learning pipeline. Real-world datasets are not cleaned or formatted, usually, for a neural network to process. Conversion to floats or integers, normalization and so on, is required before further processing. Building a data processing pipeline is also a non-trivial task, which consists of writing a lot of boilerplate code. To make it much easier, dataset builders and DataLoader
pipeline packages are built into the core of PyTorch.