A very intuitive approach to representing a document is to use the frequency of the words in that particular document. This is exactly what is done as part of the BoW approach.
In Chapter 3, Building Your NLP Vocabulary, we saw how it is possible to build a vocabulary based on a list of sentences. The vocabulary-building step comes as a prerequisite to the BoW methodology. Once the vocabulary is available, each sentence can be represented as a vector. The length of this vector would be equal to the size of the vocabulary. Each entry in the vector would correspond to a term in the vocabulary, and the number in that particular entry would be the frequency of the term in the sentence under consideration. The lower limit for this number would be 0, indicating that the vocabulary term does not occur in the sentence concerned.
What would be the upper limit for the entry in the vector?
Think!
Well, that could possibly be the frequency of the occurrence...