Summary
In this chapter, we learned how to work with text data, using various approaches to explore this type of data. We started by analyzing our target and text data, preprocessing text data to include it in a machine learning model. We also explored various NLP tools and techniques, including topic modeling, NER, and POS tagging, and then prepared the text to build a baseline model, passing through an iterative process to gradually improve the data quality for the objective set (in this case, the objective being to improve the coverage of word embeddings for the vocabulary in the corpus of text from the competition dataset).
We introduced and discussed a baseline model (based on the work of several Kaggle contributors). This baseline model architecture includes a word embedding layer and bidirectional LSTM layers. Finally, we looked at some of the most advanced solutions available, based on Transformer architectures, either as single models or combined, to get a late submission...