Identifying parts of speech, handling n-grams, and recognizing named entities
One of the first things that you might want to look at is recognizing parts of speech for a word; it is really fundamental to understand in a sentence that the word checks is a verb or noun.
This, as useful as it is, will not help you handle bigrams (or, more generally, n-grams): clusters of words that, if analyzed separately (in a certain context), would lead to improper understanding of the text. For example, consider a phrase neural networks in an article on machine learning and, more specifically, an application of neural networks to control packet scheduling and routing in a local network. In the same article, these two words (neural
and networks
) can occur on their own with, to some degree, different meanings.
Finally, reading an article on politics at a recent meeting of the heads of states, we might encounter the word President
quite frequently; what would be more interesting to understand is how many times...