Sketching our roadmap
Sentiment analysis of tweets is particularly hard because of Twitter's size limitation of 140 characters. This leads to a special syntax, creative abbreviations, and seldom well-formed sentences. The typical approach of analyzing sentences, aggregating their sentiment information per paragraph and then calculating the overall sentiment of a document, therefore, does not work here.
Clearly, we will not try to build a state-of-the-art sentiment classifier. Instead, we want to:
Use this scenario as a vehicle to introduce yet another classification algorithm: Naive Bayes
Explain how Part Of Speech (POS) tagging works and how it can help us
Show some more tricks from the scikit-learn toolbox that come in handy from time to time