Summary
In this chapter, we covered all the functions available in Optimus to easily clean and prepare your text data so you can start your NLP journey, from simple operations, such as removing stopwords and URLs, to more advanced ones, such as stemming and lemmatization.
We learned how to tokenize and tag the text in our datasets to efficiently capture information from them.
After that, we explored a couple of methods to get features from text. We saw how to use bag of words and TF-IDF to convert text to numbers that can be used as input to machine learning algorithms.
In the next chapter, we will cover what we consider to be Optimus' most advanced features, such as implementing your engine, creating custom data transformation functions, and even plotting functionality.