In this chapter, we looked at the various steps that are needed to build a natural language vocabulary. These play the most critical role in preprocessing any natural language data. Data preprocessing is probably one of the most important aspects of any machine learning application, and the same applies to NLP as well. When performed properly, these steps help with the machine learning aspects that generally occur after preprocessing the data, consequently providing better results most of the time compared with scenarios where no preprocessing is involved.
In the next chapter, we will use the techniques discussed in this chapter to preprocess data and subsequently build mathematical representations of text that can be understood by machine learning algorithms.