Metatdata, in this recipe, refers to the information about a PDF document. In this recipe, we will illustrate how to use the Apache PDFBox API to extract metadata. This information includes the document's title, subject, author, and creation dates, among other things. It will reuse some of the steps found in the Extracting text from a PDF document recipe.
Extracting metadata from a PDF document
Getting ready
To prepare this recipe, we need to do the following:
- Create a new Maven project
- Add the following dependency to the project's POM file:
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.13</version>
</dependency...