In machine learning (ML), we try to automatically discover rules for mapping input data to a desired output. In this process, it's very important to create appropriate representations of data. For example, if we want to create an algorithm to classify an email as spam/ham, we need to represent the email data numerically. One simple representation could be a binary vector where each component depicts the presence or absence of a word from a predefined vocabulary of words. Also, these representations are task-dependent, that is, representations may vary according to the final task that we desire our ML algorithm to perform.
In the preceding email example, instead of identifying spam/ham if we want to detect sentiment in the email, a more useful representation of the data could be binary vectors where the predefined vocabulary consists of words with positive...