Chapter 3: Representing Text – Capturing Semantics
Representing the meaning of words, phrases, and sentences in a form that's understandable to computers is one of the pillars of NLP processing. Machine learning, for example, represents each data point as a fixed-size vector, and we are faced with the question of how to turn words and sentences into vectors. Almost any NLP task starts with representing the text in some numeric form, and this chapter will show several ways of doing that. Once you've learned how to represent text as vectors, you will be able to perform tasks such as classification, which will be described in later chapters.
We will also learn how to turn phrases such as fried chicken into vectors, how to train a word2vec
model, and how to create a small search engine with semantic search.
The following recipes will be covered in this chapter:
- Putting documents into a bag of words
- Constructing the N-gram model
- Representing texts...