Distributed representations
Distributed representations attempt to capture the meaning of a word by considering its relations with other words in its context. The idea behind the distributed hypothesis is captured in this quote from J. R. Firth, a linguist who first proposed this idea:
"You shall know a word by the company it keeps."
How does this work? By way of example, consider the following pair of sentences:
Paris is the capital of France.
Berlin is the capital of Germany.
Even assuming no knowledge of world geography, the sentence pair implies some sort of relationship between the entities Paris, France, Berlin, and Germany that could be represented as:
"Paris" is to "France" as "Berlin" is to "Germany"
Distributed representations are based on the idea that there exists some transformation such that:
In other words, a distributed embedding space is one where words that are used in similar contexts...