To understand the reason for having a document vector, let's go through the following intuition.
The word bank is used in the context of finance and also in the context of a river. How do we identify whether the word bank in the given sentence or document is related to the topic of a river or the topic of finance?
This problem could be solved by adding a document vector, which works in a similar way to word-vector generation but with the addition of a one-hot encoded version of the paragraph ID, as follows:
In the preceding scenario, the paragraph ID encompasses the delta that is not captured by just the words. For example, in the sentence on the bank of river where on the bank of is the input and river is the output, the words on, the, and of do not contribute to the prediction as they are frequently-occurring words, while the word bank confuses...