Lexical diversity
Consider a speaker, who uses the term allow multiple times throughout the speech, compared to an another speaker who uses terms allow, concur, acquiesce, accede, and avow for the same word. The latter speech has more lexical diversity than the former. Lexical diversity is widely believed to be an important parameter to rate a document in terms of textual richness and effectiveness.
Lexical diversity, in simple terms, is a measurement of the breadth and variety of vocabulary used in a document. The different measures of lexical diversity are TTR, MSTTR, MATTR, C, R, CTTR, U, S, K, Maas, HD-D, MTLD, and MTLD-MA.
koRpus package in R provides functions to estimate the lexical diversity or complexity.
If N is the total number of tokens and V is the number of types:
Measure |
Description |
Wrapper Function (koRpus package in R) |
---|---|---|
TTR |
Type-Token Ratio |
TTR |
MSTTR |
Mean segment type token ratio |
MSTTR |
C |
logTTR |
C.ld |
R |
Root TTR |
R.ld |
CTTR |
Corrected TTR |
CTTR |
U |
Uber Index |
U... |