Using LIME for NLP
At the beginning of the chapter, we set aside training and test datasets with the cleaned-up contents of all the “tastes” columns for NLP. We can take a peek at the test dataset for NLP, as follows:
print(X_test_nlp)
This outputs the following:
1194 roasty nutty rich
77 roasty oddly sweet marshmallow
121 balanced cherry choco
411 sweet floral yogurt
1259 creamy burnt nuts woody
...
327 sweet mild molasses bland
1832 intense fruity mild sour
464 roasty sour milk note
2013 nutty fruit sour floral
1190 rich roasty nutty smoke
Length: 734, dtype: object
No machine learning model can ingest the data as text, so we need to turn it into a numerical format—in other words, vectorize it. There are many techniques we can use to do this. In our case, we are not interested in the position of words...