Until this point, we have been working with categorical and numerical data. While our categorical data has come in the form of a string, the text has been part of a single category. We will now dive deeper into longer—form text data. This form of text data is much more complex than single—category text, because we now have a series of categories, or tokens.Â
Before we get any further into working with text data, let's make sure we have a good understanding of what we mean when we refer to text data. Consider a service like Yelp, where users write up reviews of restaurants and businesses to share their thoughts on their experience. These reviews, all written in text format, contain a wealth of information that would be useful for machine learning purposes, for example, in predicting the best restaurant to visit.Â
In...