Extracting metadata from PDF
By now, you probably see a pattern regarding embedding and extracting metadata in various file types. We will cover one more format—PDF, which is a format most often used for documents shared on the Web. You are likely to read this book in its PDF version, and even if you don't, it is available in PDF format. The PDF contains some metadata that is standard for all files of the type—title, description, author, date of creation, and more. This metadata can be embedded when the file is created in Adobe Acrobat or other application capable of writing PDF files.
How to do it...
1. Install the
cc_metaexec
extension.2. Install
pdfinfo
. This utility can be installed on both Windows and Linux machines. We will install it on a Debian server using APT:Shell > apt-get install xpdf
This command will install
pdfinfo
, along with other related tools.Note
In Debain, the
pdfinfo
tool is hidden inside a different package. This may be the case for your operating system as well...