Community detection clustering with SBERT
In this recipe, we will use the community detection algorithm included with the SentenceTransformers (SBERT) package. SBERT will allow us to easily encode sentences using the BERT model. See the Using BERT and OpenAI embeddings instead of word embeddings recipe in Chapter 3 for a more detailed explanation of how to use the sentence transformers.
This algorithm is frequently used to find communities in social media but can also be used for topic modeling. The advantage of this algorithm is that it is very fast. It works best on shorter texts, such as texts found on social media. It also only discovers the main topics in the document dataset, as opposed to LDA, which clusters all available text. One use of the community detection algorithm is finding duplicate posts on social media.
Getting ready
We will use the SBERT package in this recipe. It is included in the poetry environment. You can also install it together with other packages...