Text Data Processing
Before we start building machine learning models for our textual data, we need to process the data. First, we will learn the different ways in which we can understand what the data comprises. This helps us get a sense of what the data really is and decide on the preprocessing techniques to be used in the next step. Next, we will move on to learn the techniques that will help us preprocess the data. This step helps reduce the size of the data, thus reducing the training time, and also helps us transform the data into a form that would be easier for machine learning algorithms to extract information from. Finally, we will learn how to convert the textual data to numbers so that machine learning algorithms can actually use it to create models. We do this using word embedding, much like the entity embedding we performed in Chapter 5: Mastering Structured Data.
Regular Expressions
Before we start working on textual data, we need to learn about regular expressions (RegEx)....