It is well known that a very large percentage of relevant information originates in an unstructured form, an important player being text data. Text analysis, Natural Language Processing (NLP), Information Retrieval (IR), and Statistical Learning (SL) are some areas focused on developing techniques and processes to deal with this data. These techniques and processes discover and present knowledge, facts, business rules, relationships, among others, that is otherwise locked in textual form, impenetrable to automated processing.
Given the explosion of textual data we see nowadays, an important skill for analysts such as statisticians and data scientists is to be able to work efficiently with this data and find the insights they are looking for. In this chapter, we will try to predict whether a customer is going to make repeated purchases given...