A common format for documents is PDF (Portable Document Format). It started as a format to describe a document for any printer, so PDF is a format that ensures that the document will be printed exactly as it shows, and therefore is a great way of guaranteeing consistency. It has become a powerful standard for sharing documents, especially documents that are read-only.
Reading PDF files
Getting ready
For this recipe, we are going to use the PyPDF2 module. We need to add it to our virtual environment:
>>> echo "PyPDF2==1.26.0" >> requirements.txt
>>> pip install -r requirements.txt
In the GitHub directory Chapter03/documents, we have prepared two documents, document-1.pdf and document-2.pdf,...