In this chapter, we'll study and implement common algorithms that are used in NLP, which can help us develop machines that are capable of automatically analyzing and understanding human text and speech in context. Specifically, we will study and implement the following classes of computer science algorithms related to NLP:
- Feature transformers, including the following:
- Tokenization
- Stemming
- Lemmatization
- Normalization
- Feature extractors, including the following :
- Bag of words
- Term frequency–inverse document frequency