Summary
LDA, BERTopic, and their variants are prevailing techniques for topic modeling. Motivated by the fact that Transformer-based models can produce better word embeddings, BERTopic adopts BERT word embeddings to better capture the semantic relationships between words and documents. BERTopic applies a modular design approach that consists of five modules: BERT, UMAP, HDBSCAN, c-TFIDF, and MMR. We learned about the advantages of BERTopic, which adopts these components. We also learned how to build, interpret, and visualize a BERTopic model.
In the next chapter, we will survey several real-world NLP applications in healthcare, clinical texts, legal documents, finance, and social media. The chapter aims to inspire you to provide solutions to your NLP challenges. I will also introduce an application that compares the BERTopic and LDA modeling results. You will have a chance to understand the differences.