Data and algorithms
Now, if using the algorithms is not the main part of the machine learning code, then something else must be – that is, data handling. Managing data in machine learning software, as shown in Figure 2.1, consists of three areas:
- Data collection.
- Feature extraction.
- Data validation.
Although we will go back to these areas throughout this book, let’s explore what they contain. Figure 2.2 shows the processing pipeline for these areas:
Figure 2.2 – Data collection and preparation pipeline
Note that the process of preparing the data for the algorithms can become quite complex. First, we need to extract data from its source, which is usually a database. It can be a database of measurements, images, texts, or any other raw data. Once we’ve exported/extracted the data we need, we must store it in a raw data format. This can be in the form of a table, as shown in the preceding figure, or it can...