Data collection
Data collection is a procedure of transforming data from its raw format to a format that a machine learning algorithm can take as input. Depending on the data and the algorithm, this process can take different forms, as illustrated in Figure 2.3:
Figure 2.3 – Different forms of data collection – examples
Data from images and measurements such as time series is usually collected to make classifications and predictions. These two classes of problems require the ground truth to be available, which we saw as Y_train
in the previous code example. These target labels are either extracted automatically from the raw data or added manually through the process of labeling. The manual process is time-consuming, so the automated one is preferred.
The data that’s used in non-supervised learning and reinforcement learning models is often extracted as tabular data without labels. This data is used in the decision process or the...