Text clustering
Text clustering is an unsupervised learning algorithm that helps to find and group similar objects together. The objective is to create groups or clusters that are internally coherent but are substantially dissimilar from each other, or they are far from each other when we express similarity in terms of distance. In simple words, the objects inside a cluster are as similar to each other, as possible, while the objects in one cluster are as dissimilar or far from the objects in another cluster as possible.
Traditionally, clustering has been applied on numeric data. Lately, it has found its usage even in text data. Text clustering is utilized to group text objects of different granularities such as documents, paragraphs, sentences, or terms together. We can find the application of text clustering in many tasks related to text data, for example, corpus summarization, document organization, document classification, taxonomy generation of web documents, organizing search engine...