Sentiment analysis
Twitter is one of the most popular social media platforms and an important communication tool for many, individuals and companies alike.
Capturing sentiment in language is particularly important in the latter context: a positive tweet can go viral and spread the word, while a particularly negative one can be harmful. Since human language is complicated, it is important not to just decide on the sentiment, but also to be able to investigate the how: which words actually led to the sentiment description?
We will demonstrate an approach to this problem by using data from the Tweet Sentiment Extraction competition (https://www.kaggle.com/c/tweet-sentiment-extraction). For brevity, we have omitted the imports from the following code, but you can find them in the corresponding Notebook in the GitHub repo for this chapter.
To get a better feel for the problem, let’s start by looking at the data:
df = pd.read_csv('/kaggle/input/tweet-sentiment...