Summary
In this chapter, we’ve learned how to select different NLP approaches, based on the available data and other requirements. In addition, we’ve learned about representing data for NLP applications. We’ve placed particular emphasis on vector representations, including vector representations of both documents and words. For documents, we’ve covered binary bag of words, count bag of words, and TF-IDF. For representing words, we’ve reviewed the Word2Vec approach and briefly introduced context-dependent vectors, which will be covered in much more detail in Chapter 11.
In the next four chapters, we will take the representations that we’ve learned about in this chapter and show how to train models from them that can be applied to different problems such as document classification and intent recognition. We will start with rule-based techniques in Chapter 8, discuss traditional machine learning techniques in Chapter 9, talk about neural networks...