Language detection
TextCat is a text classification utility. The primary usage of TextCat is language identification. textcat
package in R provides wrapper function for n-gram based text categorization and the language detection. It can detect up to 75 languages:
Library(textcat)>my.profiles <- TC_byte_profiles[names(TC_byte_profiles)] >my.profiles A textcat profile db of length 75. > my.text <- c("This book is in English language", "Das ist ein deutscher Satz.", "Il s'agit d'une phrase française.", "Esta es una frase en espa~nol.") textcat(my.text, p = my.profiles) > textcat(my.text, p = my.profiles) [1] "english" "german" "french" "spanish"