LingPipe provides a technique for the identification of languages. It is based on a model that was derived from training data found in the Leipzig Corpora Collection (http://corpora.uni-leipzig.de/en?corpusId=deu_newscrawl_2011). In this recipe, we will demonstrate how this model can be used to identify the language used in a document.
Detecting the natural language in use using LingPipe
Getting ready
To prepare, we need to follow these steps:
- Create a new Maven project
- Add the following dependency to the project's POM file:
<!-- https://mvnrepository.com/artifact/de.julielab/aliasi-lingpipe -->
<dependency>
<groupId>de.julielab</groupId>
<artifactId>aliasi-lingpipe</artifactId...