Simple Java SBDs
Sometimes, text may be simple enough that Java core support will suffice. There are two approaches that will perform SBD: using regular expressions and using the BreakIterator
class. We will examine both approaches here.
Using regular expressions
Regular expressions can be difficult to understand. While simple expressions are not usually a problem, as they become more complex, their readability worsens. This is one of the limitations of regular expressions when trying to use them for SBD.
We will present two different regular expressions. The first expression is simple, but does not do a very good job. It illustrates a solution that may be too simple for some problem domains. The second is more sophisticated and does a better job.
In this example, we create a regular expression class that matches periods, question marks, and exclamation marks. The String
class' split
method is used to split the text into sentences:
String simple = "[.?!]"; String[] splitString...