So far, in our discussion of AI and deep learning, we've focused a lot on how rooted this field is in fundamental mathematical principles; so what do we do when we are faced with an unstructured source data such as text? In the previous chapters, we've talked about how we can convert images to numbers via convolutions, so how do we do the same thing with text? In modern AI systems, we use a technique called word embedding.
Word embedding is not a class of predictive models itself, but a means of pre-processing text so that it can be an input to a predictive model, or as an exploratory technique for data mining. It's a means by which we convert words and sentences into vectors of numbers, themselves called word embeddings. The document, or group of documents, that is used to train an embedding algorithm is called a corpus, and these provide our embedding...