Clustering words by their lexemes
Words that look alike can easily be clustered together. The clustering algorithm in the lexeme-clustering package is based on Janicki's research paper titled, "A Lexeme-Clustering Algorithm for Unsupervised Learning of Morphology". A direct link to this paper can be found through the following URL: http://skil.informatik.uni-leipzig.de/blog/wp-content/uploads/proceedings/2012/Janicki2012.37.pdf.
Getting ready
An Internet connection is necessary for this recipe to download the package from GitHub.
How to do it…
Follow these steps to install and use the library:
- Obtain the lexeme-clustering library from GitHub. If Git is installed, enter the following command, otherwise download it from https://github.com/BinRoot/lexeme-clustering/archive/master.zip:
$ git clone https://github.com/BinRoot/lexeme-clustering
- Change into the library's directory:
$ cd lexeme-clustering/
- Install the package:
$ cabal install
- Create an input file with a different...