Finding sentences
Words (tokens) aren't the only structures that we're interested in, however. Another interesting and useful grammatical structure is the sentence. In this recipe, we'll use a process similar to the one we used in the previous recipe, Tokenizing text, in order to create a function that will pull sentences from a string in the same way that tokenize pulled tokens from a string in the last recipe.
Getting ready
We'll need to include clojure-opennlp
in our project.clj
file:
(defproject com.ericrochester/text-data "0.1.0-SNAPSHOT" :dependencies [[org.clojure/clojure "1.6.0"] [clojure-opennlp "0.3.2"]])
We will also need to require it into the current namespace:
(require '[opennlp.nlp :as nlp])
Finally, we'll download a model for a statistical sentence splitter. I downloaded en-sent.bin
from http://opennlp.sourceforge.net/models-1.5/. I then saved it into models/en-sent.bin.
How to do it…
As in the...