We came across one-hot embeddings while identifying fraudulent emails in Chapter 3, Fraud Detection with Autoencoders. The idea is to represent each word as a basis vector; that is, a vector with zeros except one coordinate. Hence, each document (a review in this case) is represented as a vector with ones and zeros. We went a bit further from that and used different weighting (tf-idf).
Let's revisit this model once again, but include n-grams instead of single words. This will be our benchmark for the more sophisticated word embeddings we will do later.Â