Summary
In this chapter, we have taken a deeper dive into word embeddings and their applications. We have demonstrated how they can be trained using a continuous bag-of-words model and how we can incorporate n-gram language modeling to better understand the relationship between words in a sentence. We then looked at splitting documents into individual tokens for easy processing and how to use tagging and chunking to identify parts of speech. Finally, we showed how TF-IDF weightings can be used to better represent documents in embedding form.
In the next chapter, we will see how to use NLP for text preprocessing, stemming, and lemmatization.