Clustering Speech-to-Text Transcriptions
When dealing with real-world datasets, the most common situation is that they come unlabeled—manually labeling each sample is often unrealistic in terms of time and cost. Therefore, it is imperative to incorporate methods that can handle datasets of this type. Unsupervised learning algorithms are applicable in this case and, in this chapter, we deal with a particular kind for grouping similar data under the same category. Expressly, we incorporate clustering methods that allow the transformation of raw data into possible actionable insights, for instance, identifying the general theme in each cluster.
While the previous chapters focused mainly on supervised learning techniques, we dedicate the current one solely to unsupervised methods. Another differentiation is the creation of the text corpus using speech-to-text technology. Next, as the chapter unfolds, we present hard and soft clustering techniques, providing insight into their...