Test your knowledge
Practice some of what we've learned here on new data. You might collect some social media data using an API, such as from Reddit as we did in chapter 7, and apply some of the basic analysis (word count frequency plots), sentiment analysis, and topic modeling. You might also train your own sentiment or emotion classifier using a public dataset. If you create your own sentiment classifier, you can extract document vectors from the text and use that as features, which might give better results than using TFIDF vectors. However, be careful to use similar training data to the data you will use the classifier on (for example, train the classifier on social media data if that is the application).