Extracting data from a PDF
The ubiquity of PDF files is due to the ability of almost every PC, Mac, and smart device to open and process this format. Electronic documents are often exchanged as PDF because they cannot be easily altered and are, by default, read-only.
Many organizations use PDF files to distribute reports, bank statements, and invoices. Being able to read such documents and extract the information they provide it's an invaluable tool in the belt of a Groovy programmer.
This recipe focuses on mining information from a PDF file.
Getting ready
As for ZIP files (see the Reading data from a ZIP file recipe), Groovy doesn't have any class to deal with PDF files. Java too doesn't offer any built-in feature to read or write PDFs. Therefore, we are left to resorting to a third-party library. A Google search for Java read PDF yields numerous results with links to various libraries.
In this recipe, we will use iText, the most popular PDF library for the Java ecosystem. iText is a very powerful...