In this recipe, we will use LingPipe's RegExChunker class to illustrate another approach for finding entities. This approach is based on the concept of chucks, which are textual units representing a set of data. Specifically, we will show how to identify email addresses within text.
Using chunks with regular expressions to identify entities
Getting ready
To prepare, we need to follow these steps:
- Create a new Maven project
- Add the following dependency to the project's POM file:
<dependency>
<groupId>de.julielab</groupId>
<artifactId>aliasi-lingpipe</artifactId>
<version>4.1.0</version>
</dependency>