Summary
In this chapter, you learned about the basic concepts of unsupervised learning. We also worked through a specific application of unsupervised learning, topic modeling, using a BERT-based tool called BERTopic. We used the BERTopic package to identify clusters of semantically similar documents and propose labels for the clusters based on the words they contain, without needing to use any supervised annotations of the cluster topics.
In the next chapter, Chapter 13, we will address the question of measuring how good our results are using quantitative techniques. Quantitative evaluation is useful in research applications to compare results to those from previous research, and it is useful in practical applications to ensure that the techniques being used meet the application’s requirements. Although evaluation was briefly discussed in earlier chapters, Chapter 13 will discuss it in depth. It will include segmenting data into training, validation, and test data, evaluation...