Every data has its purpose – annotations and tasks
Data in raw format is important, but only the first step in the development and operations of ML software. The most important part, and the costliest one, is the annotation of the data. To train an ML model and then use it to make inferences, we need to define a task. Defining a task is both conceptual and operational. The conceptual definition is to define what we want the software to do, but the operational definition is how we want to achieve that goal. The operational definition boils down to a definition of what we see in the data and what we want the ML model to identify/replicate.
Annotations are the mechanisms by which we direct the ML algorithms. Every piece of data that we use requires some sort of label to denote what it is. In the raw format of the data, this annotation can be a label of what the data point contains. For example, such a label can be that the image contains the number 1 (from the MNIST dataset...