Identifying key words in a corpus of text
One way to predict the topic of a paragraph or sentence is by identifying what the words mean. While the parts of speech give some insight about each word, they still don't reveal the connotation of that word. In this recipe, we will use a Haskell library to tag words by topics such as PERSON
, CITY
, DATE
, and so on.
Getting ready
An Internet connection is necessary for this recipe to download the sequor
package.
Install it from cabal:
$ cabal install sequor --prefix=`pwd`
Otherwise, follow these directions to install it manually:
- Obtain the latest version of the sequor library by opening up a browser and visiting the following URL: http://hackage.haskell.org/package/sequor.
- Under the Downloads section, download the cabal source package.
- Extract the contents:
- On Windows, it is easiest to using 7-Zip, an easy-to-use file archiver. Install it on your machine by going to http://www.7-zip.org. Then using 7-Zip, extract the contents of the tarball.
- On other...