Latent Semantic Indexing with Gensim
In Chapter 4, Latent Semantic Indexing with scikit-learn, we learned about the construction of LSI from SVD and used scikit-learn to perform LSI. We also mentioned that the Gensim library has programmed LSI in a few lines of code for production purposes. In this chapter, we will build the LSI model with Gensim. We will also learn how to determine the right number of topics. I’ll also demonstrate to you how to put the model to real use as a search engine. This production-oriented perspective will help data scientists from non-NLP areas to consider stepping into the NLP world.
This chapter covers the following topics:
- Performing text preprocessing
- Performing text representation with BoW and TF-IDF
- Modeling with Gensim
- Using the coherence score to find the optimal number of topics
- Understanding the final model
- Using the model as an information retrieval tool
After completing this chapter, you will be able...