Text contains features that need to be extracted, bearing in mind their context, but processing a whole section of text together to include context is very difficult for machines.
In this chapter, we will see how text is presented using N-grams and what role they play in associating the context. We will see word embedding, in which the words' representations are converted or mapped to numbers (real numbers) so that machines can understand and process them in a better way. This may lead to the issue of high dimensionality due to the amount of text. So, next, we will see how to reduce the dimensions of vectors in such a way that the context is preserved.
In this chapter we will cover the following topics:
- N-grams
- Word embedding
- GloVe
- word2vec
- Dimensionality reduction
- Principle component analysis
- Distributed stochastic neighbor embedding...