We create a very basic argument handler that accepts one positional input, DIR_PATH, the path of the input directory to iterate. As an example, we will use the ~/Desktop path, the parent of SecretDocs, as the input argument for the script. We parse the command-line arguments and assign the input directory to a local variable. We’re now ready to begin iterating over this input directory:
from __future__ import print_function
import argparse
import os
__authors__ = ["Chapin Bryce", "Preston Miller"]
__date__ = 20170815
__description__ = "Directory tree walker"
parser = argparse.ArgumentParser(
description=__description__,
epilog="Developed by {} on {}".format(
", ".join(__authors__), __date__)
)
parser.add_argument("DIR_PATH", help="Path to directory")
args = parser.parse_args()
path_to_scan = args.DIR_PATH
To iterate over a directory, we need to provide a string representing its path to os.walk(). This method returns three objects in each iteration, which we have captured in the root, directories, and files variables:
- root: This value provides the relative path to the current directory as a string. Using the example directory structure, root would start as SecretDocs and eventually become SecretDocs/Team and SecretDocs/Plans/SuccessfulPlans.
- directories: This value is a list of sub-directories located within the current root location. We can iterate through this list of directories, although the entries in this list will become part of the root value during successive os.walk() calls. For this reason, the value is not frequently used.
- files: This value is a list of files in the current root location.
Be careful in naming the directory and file variables. In Python the dir and file names are reserved for other uses and should not be used as variable names.
# Iterate over the path_to_scan
for root, directories, files in os.walk(path_to_scan):
It is common to create a second for loop, as shown in the following code, to step through each of the files located in that directory and perform some action on them. Using the os.path.join() method, we can join the root and file_entry variables to obtain the file’s path. We then print this file path to the console. We may also, for example, append this file path to a list that we later iterate over to process each of the files:
# Iterate over the files in the current "root"
for file_entry in files:
# create the relative path to the file
file_path = os.path.join(root, file_entry)
print(file_path)
We can also use root + os.sep() + file_entry to achieve the same effect, but it is not as Pythonic as the method we're using to join paths. Using os.path.join(), we can pass two or more strings to form a single path, such as directories, subdirectories, and files.
When we run the preceding script with our example input directory, we see the following output:
As seen, the os.walk() method iterates through a directory, then will descend into any discovered sub-directories, thereby scanning the entire directory tree.