There is a lot of useful data found in spreadsheets. In this recipe, we will illustrate how to extract text from an Excel spreadsheet using the Apache PDFBox API. We will create a sample spreadsheet for the examples to work against.
Extracting text from a spreadsheet
Getting ready
To prepare this recipe, we need to do the following:
- Create a new Maven project.
- Add the following dependency to the project's POM file:
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.13</version>
</dependency>
- Create a new Excel spreadsheet that appears as follows. The last cell entry contains a hyperlink to www.weather.com:
![](https://static.packt-cdn.com/products/9781789801156/graphics/assets/b24b3a5c-d332-4754-baa1-cbcc144209f2.png)
- Modify...