Chapter 3: NLP and Text Embeddings
There are many different ways of representing text in deep learning. While we have covered basic bag-of-words (BoW) representations, unsurprisingly, there is a far more sophisticated way of representing text data known as embeddings. While a BoW vector acts only as a count of words within a sentence, embeddings help to numerically define the actual meaning of certain words.
In this chapter, we will explore text embeddings and learn how to create embeddings using a continuous BoW model. We will then move on to discuss n-grams and how they can be used within models. We will also cover various ways in which tagging, chunking, and tokenization can be used to split up NLP into its various constituent parts. Finally, we will look at TF-IDF language models and how they can be useful in weighting our models toward infrequently occurring words.
The following topics will be covered in the chapter:
- Word embeddings
- Exploring CBOW
- Exploring...