Before we start looking at the steps for building vocabulary, we need to understand phonemes, graphemes, and morphemes:
- Phonemes can be thought of as the speech sounds, made by the mouth or unit of sound, that can differentiate one word from another in a language.
- Graphemes are groups of letters of size one or more that can represent these individual sounds or phonemes. The word spoon consists of five letters that actually represent four phonemes, identified by the graphemes s, p, oo, and n.
- A morpheme is the smallest meaningful unit in a language. The word unbreakable is composed of three morphemes:
- un—a bound morpheme signifying not
- break—the root morpheme
- able—a free morpheme signifying can be done
Now, let's delve into some practical aspects that form the base of every NLP-based system.