One of the major drawbacks of the method that we discussed previously (for obtaining document-level vector representations) is that the words present in the document are given equal weight age. Such an approach suppresses the importance of certain words in the sentence, which actually add value to the meaning of the document. Let us discuss this point with a concrete example of how the method builds a document-level vector representation. Take a look at the sentence, Jack and Jill went up the hill to fetch a pail of water. In the sentence, the words that provide the most context about the action happening are Jack, Jill, hill, fetch, pail, and water. However, the preceding method builds the representation with equal weight given to every single word in the sentence.
An interesting approach that has been discussed is using the weighted average of word vectors, instead...