To summarize the process required to use an NLP supervised ML model for sentiment analysis, I have created the following diagram, which shows the elements in a logical progression indicated by the letters A through E:
The process begins with our source Unstructured Input Data, which is represented in the preceding diagram with the letter A. Since unstructured data has different formats, structures, and forms such as a tweet, sentence, or paragraph, we need to perform extra steps to work with the data to gain any insights.
The next element is titled Text Normalization and is represented by the letter B in the preceding diagram, and involves concepts such as tokenization, n-grams, and bag-of-words (BoW), which were introduced in Chapter 10, Exploring Text Data and Unstructured Data. Let's explore them in more detail so that we can learn how they are applied in sentiment analysis. BoW is when a string of text...