Distributed Representation for Text
Why are word embeddings so popular? Why are we claiming they are amazingly powerful? What makes them so special? To understand and appreciate word embeddings, we need to acknowledge the shortcomings of the representations so far.
The terms "footpath" and "sidewalk" are synonyms. Do you think the approaches we've discussed so far will be able to capture this information? Well, you could manually go in and replace "sidewalk" with "footpath" so that both have the same token eventually, but can you do this for all possible synonyms in the language?
The terms "hot" and "cold" are antonyms. Do the previous Bag-of-Words representations capture this? What about "dog" being a type of "animal"? "Cockpit" being a part of a "plane"? Differentiating between a dog's bark and a tree's bark? Can you handle all these cases manually?
All the...