Introduction
The previous chapters laid a firm foundation for NLP. But now we will go deeper into a key topic – one that gives us surprising insights into how a language works and how some of the key advances in human computer interaction are facilitated. At the heart of NLP is the simple trick of representing text as numbers. This helps software algorithms to perform the sophisticated computations that are required to understand the meaning of text.
Text representation can be as simple as encoding each word as an integer. But it can also include using an array of numbers for each word. Each of these representations help machine learning programs to function effectively.
This chapter begins by discussing vectors, how text can be represented as vectors, and how vectors can be composed to represent complex speech. We will walk through the various representations in both directions – learning how to encode text as vectors as well as how to retrieve text from vectors...