Scanning documents for a keyword
In this recipe, we will apply all the lessons from the previous recipes and search all the files in the directory for a particular keyword. This is a recap of the rest of the recipes in this chapter and includes a script that searches different kinds of files.
Getting ready
Be sure to include the following modules in the requirements.txt
file and install them into your virtual environment:
beautifulsoup4==4.8.2
Pillow==7.0.0
PyPDF2==1.26.0
python-docx==0.8.10
Check that the directory to search has the following files (all are available in https://github.com/PacktPublishing/Python-Automation-Cookbook-Second-Edition/tree/master/Chapter04/documents/. Note that file5.pdf
and file6.pdf
are copies of document-1.pdf
, for simplicity. file1.txt
to file4.txt
are empty files:
├── dir
│ ├── file1.txt
│ ├── file2.txt
│ ├── file6.pdf...