Semantic similarity refers to how closely related two or more different texts are to each other. That is, how much two words, sentences, or other text entities are alike. Finding similarities is useful as a classification technique and has been used by applications such as spelling and plagiarism checkers.
We can assess the similarity between two words using a number of techniques. At a simplistic level, we can identify how much change is required to convert one word into another word using a sequence of insertion, deletion, and/or substitution operations.
At a deeper level, we can examine the meaning of words to determine their similarity. For example, the words teaching and instructing are spelled very differently, but they convey the same basic concept. Stemming and lemmatization can be useful in making these types of comparisons...