Reading PDF files
A common format for documents is PDF (Portable Document Format). It started as a format to describe a document for any printer, so PDF is a format that ensures that the document will be printed exactly as shown. It has become a powerful standard for sharing documents, especially documents that are final and intended to be read-only.
Getting ready
For this recipe, we are going to use the PyPDF2
module. We need to add it to our virtual environment:
$ echo "PyPDF2==1.26.0" >> requirements.txt
$ pip install -r requirements.txt
In the GitHub directory Chapter03/documents
, we have prepared two documents, document-1.pdf
and document-2.pdf
, to use in this recipe. Note they contain mostly Lorem Ipsum text, which is just placeholder.
Lorem Ipsum text is commonly used in design to show text without needing to create the content before the design. You can learn more about it here: https://loremipsum.io/.
They are...