Summary
In this chapter, we learned how LSI was developed based on SVD. We learned a large document-term matrix can be decomposed into three matrices through SVD. We also learned about a few basic properties of matrix operations and transformation matrices, as well as eigenvectors and eigenvalues, to understand SVD. After that, we applied SVD to real data to observe the outcome.
Gensim has packaged LSI in a few lines of code for efficient production. While this chapter walked you through the theoretical construction of LSI, Chapter 6, Latent Semantic Indexing with Gensim, will teach you how to build an LSI model for production. However, there is an important NLP concept that you should learn about before learning about LSI with Gensim. It is cosine similarity. It is a fundamental concept used extensively in the NLP field, including modern word embeddings and large language models. Let’s move on to the next chapter.