Sentiment analysis of tweets is particularly hard, because of Twitter's size limitation per message. This leads to a special syntax, creative abbreviations, and seldom-well-formed sentences. The typical approach of analyzing sentences, aggregating their sentiment information per paragraph, and then calculating the overall sentiment of a document does not work here.
Clearly, we will not try to build a state-of-the-art sentiment classifier. Instead, we want to do the following:
- Use this scenario as a vehicle to introduce yet another classification algorithm, Naïve Bayes
- Explain how Part Of Speech (POS) tagging works and how it can help us
- Show some more tricks from the scikit-learn toolbox that can come in handy