In this section, we review how to extract metadata from pdf documents with pyPDF2 module.
Extracting metadata from pdf documents
Introduction to PyPDF2
One of the modules available in Python to extract data from PDF documents is PyPDF2. The module can be downloaded directly with the pip install utility since it is located in the official Python repository .
In the https://pypi.org/project/PyPDF2/ URL, we can see the last version of this module:
This module offers us the ability to extract document information, and encrypt and decrypt documents. To extract metadata, we can use the PdfFileReader class and the getDocumentInfo() method, which returns a dictionary with the data of the document:
The following function would allow...