Fundamentals of NLP
NLP, at its core, works by splitting a chunk of text (also referred to as a corpus) into individual segments or tokens and then analyzing them. These tokens might simply be individual words but might also be word contractions. Let's look at how a computer might interpret the phrase: I have watered the plants.
If we were to split this corpus into tokens, it would probably look something like this:
['I', 'have', 'watered', 'the', 'plants']
The word the
in our corpus is unnecessary as it does not help to understand the phrase's intent— the same for the word have
. We should therefore remove the surplus words:
['I', 'watered', 'plants']
Already, this is starting to look more usable. We have a personal pronoun in the form of an actor (I
), an action or verb (watered
), and a recipient or noun (plants
). From this, we can deduce exactly which action is enacted to what and by whom. Furthermore, by conjugating the verb watered
, we can establish that this action occurred in the past. Consider...