There is a lot of useful data found in spreadsheets. In this recipe, we will illustrate how to extract text from an Excel spreadsheet using the Apache PDFBox API. We will create a sample spreadsheet for the examples to work against.
Extracting text from a spreadsheet
Getting ready
To prepare this recipe, we need to do the following:
- Create a new Maven project.
- Add the following dependency to the project's POM file:
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.13</version>
</dependency>
- Create a new Excel spreadsheet that appears as follows. The last cell entry contains a hyperlink to www.weather.com:
data:image/s3,"s3://crabby-images/b8263/b82634518c7b52974a1876fb05d04c9f57ad98f1" alt=""
- Modify...