Getting started with Natural Language Toolkit (NLTK)
NLTK is a powerful Python library for computational linguistics and text classification. NLTK include about 50 corpora and lexical resources such as Wordnet. NLTK is the most used tool for natural language processing in Python. It includes powerful algorithms for text tokenization, parsing, semantic reasoning, and text classification. We can find a complete guide of NLTK from http://nltk.org/.
To install NLTK, we just need to download the executable file from the website for windows and use easy_install
in Linux distributions.
Tip
We may need to install PyYaml in order to use NLTK. We can download PyYaml from http://pyyaml.org/wiki/PyYAML.
NLTK defines four basic classifiers:
Naive Bayes
Maximum entropy (or Logistic regression)
Decision tree
Conditional exponential
Tip
In this chapter, we will use NLTK 3.0, which supports Python 3. However, it's still in alpha release (Sept 2013) and is likely to contain bugs. We can download the NLTK 3 from http...