Chapter 9: Natural Language Processing
Terabytes of text data are created on a daily basis by users of all sorts of software, from enterprise systems to social networks. All this unprocessed data hides amazing opportunities to improve how businesses work.
In this chapter, we will learn how to clean and process our data in order to prepare it to create features that can be used as input to create machine learning models.
The topics we will be covering in this chapter are as follows:
- Natural language processing
- Removing unwanted strings
- Stemming and lemmatization
word_tokenizer
- Feature extraction from text