Introduction
Modeling based on structured data gathered via a controlled experiment (as we were doing in previous chapters) is relatively straightforward. However, in the real world, we rarely deal with structured data. This is especially true when it comes to understanding human-generated feedback or analyzing an article in a newspaper.
Natural Language Processing (NLP) is a discipline of computer science, statistics, and linguistics that aims at processing human language (I consciously did not use the word, understanding) and extracting features that can be used in modeling. Using NLP concepts, among other tasks, we can find the most occurring words in a text in order to roughly identify the topic of such a body of text, identify names of people and places, find objects and subjects in a sentence, or analyze the sentiment of someone's feedback.
In this set of recipes, we will be using two datasets. We will read the first one off the Seattle Times website—the Obama moves to require background...