Summary
This chapter followed the path established in the previous chapter, further focusing on more advanced techniques for solving the topic classification problem.
Specifically, we saw how to extend the exploratory data analysis phase using different plot types to help make informed decisions. In this context, we had the opportunity to learn algorithms for dimensionality reduction, either for visualization or feature selection.
Then, we incorporated two supervised ML algorithms and introduced a novel representation of the text data based on word embedding. This representation was put into operation using our custom classifiers and an open source tool. The next chapter deals with another typical problem in NLP: how to perform sentiment analysis on a text corpus.