Latent Semantic Indexing (LSI, also called Latent Semantic Analysis) sets out to improve the results of queries that omitted relevant documents containing synonyms of query terms. It aims to model the relationships between documents and terms to be able to predict that a term should be associated with a document, even though, because of variability in word use, no such association was observed.
LSI uses linear algebra to find a given number, k, of latent topics by decomposing the DTM. More specifically, it uses Singular Value Decomposition (SVD) to find the best lower-rank DTM approximation using k singular values and vectors. In other words, LSI is an application of the unsupervised learning techniques of dimensionality reduction we encountered in Chapter 12, Unsupervised Learning to the text representation that we covered in Chapter 13, Working with...