Topic modeling using BERTopic
In this recipe, we will explore the BERTopic package that provides many different and versatile tools for topic modeling and visualization. It is especially useful if you would like to do different visualizations of the topic clusters created. This topic modeling algorithm uses BERT embeddings to encode the data, hence the “BERT” in the name. You can learn more about the algorithm and its constituent parts at https://maartengr.github.io/BERTopic/algorithm/algorithm.html.
The BERTopic package, by default, uses the HDBSCAN algorithm to create clusters from the data in an unsupervised fashion. You can learn more about how the HDBSCAN algorithm works at https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html. However, it is also possible to customize the inner workings of a BERTopic object to use other algorithms. It is also possible to substitute other custom components into its pipeline. In this recipe, we will use the default settings...