In this article by Preston Miller and Chapin Bryce, authors of Learning Python for Forensics, we introduce a recipe from our upcoming book, Python Digital Forensics Cookbook. In Python Digital Forensics Cookbook, each chapter is comprised of many scripts, or recipes, falling under specific themes. The "Iterating Through Files" recipe shown here, is from our chapter that introduces the Sleuth Kit's Python binding's, pystk3, and other libraries, to programmatically interact with forensic evidence containers. Specifically, this recipe shows how to access a forensic evidence container and iterate through all of its files to create an active file listing of its contents.
(For more resources related to this topic, see here.)
If you are reading this article, it goes without saying that Python is a key tool in DFIR investigations. However, most examiners, are not familiar with or do not take advantage of the Sleuth Kit's Python bindings. Imagine being able to run your existing scripts against forensic containers without needing to mount them or export loose files. This recipe continues to introduce the library, pytsk3, that will allow us to do just that and take our scripting capabilities to the next level.
In this recipe, we learn how to recurse through the filesystem and create an active file listing. Oftentimes, one of the first questions we, as the forensic examiner, are asked is "What data is on the device?". An active file listing comes in handy here. Creating a file listing of loose files is a very straightforward task in Python. However, this will be slightly more complicated because we are working with a forensic image rather than loose files.
This recipe will be a cornerstone for future scripts as it will allow us to recursively access and process every file in the image. As we continue to introduce new concepts and features from the Sleuth Kit, we will add new functionality to our previous recipes in an iterative process. In a similar way, this recipe will become integral in future recipes to iterate through directories and process files.
Getting started
Refer to the Getting started section in the Opening Acquisitions recipe for information on the build environment and setup details for pytsk3 and pyewf. All other libraries used in this script are present in Python's standard library.
How to do it...
We perform the following steps in this recipe:
Import argparse, csv, datetime, os, pytsk3, pyewf, and sys;
Identify if the evidence container is a raw (DD) image or an EWF (E01) container;
Access the forensic image using pytsk3;
Recurse through all directories in each partition;
Store file metadata in a list;
And write the active file list to a CSV.
How it works...
This recipe's command-line handler takes three positional arguments: EVIDENCE_FILE, TYPE, OUTPUT_CSV which represents the path to the evidence file, the type of evidence file, and the output CSV file, respectively. Similar to the previous recipe, the optional p switch can be supplied to specify a partition type. We use the os.path.dirname() method to extract the desired output directory path for the CSV file and, with the os.makedirs() function, create the necessary output directories if they do not exist.
if __name__ == '__main__':
# Command-line Argument Parser
parser = argparse.ArgumentParser()
parser.add_argument("EVIDENCE_FILE", help="Evidence file path")
parser.add_argument("TYPE", help="Type of Evidence", choices=("raw",
"ewf"))
parser.add_argument("OUTPUT_CSV", help="Output CSV with lookup
results")
parser.add_argument("-p", help="Partition Type", choices=("DOS", "GPT",
"MAC", "SUN"))
args = parser.parse_args()
directory = os.path.dirname(args.OUTPUT_CSV)
if not os.path.exists(directory) and directory != "":
os.makedirs(directory)
Once we have validated the input evidence file by checking that it exists and is a file, the four arguments are passed to the main() function. If there is an issue with initial validation of the input, an error is printed to the console before the script exits.
if os.path.exists(args.EVIDENCE_FILE) and
os.path.isfile(args.EVIDENCE_FILE):
main(args.EVIDENCE_FILE, args.TYPE, args.OUTPUT_CSV, args.p)
else:
print("[-] Supplied input file {} does not exist or is not a
file".format(args.EVIDENCE_FILE))
sys.exit(1)
In the main() function, we instantiate the volume variable with None to avoid errors referencing it later in the script. After printing a status message to the console, we check if the evidence type is an E01 to properly process it and create a valid pyewf handle as demonstrated in more detail in the Opening Acquisitions recipe. Refer to that recipe for more details as this part of the function is identical. The end result is the creation of the pytsk3 handle, img_info, for the user supplied evidence file.
def main(image, img_type, output, part_type):
volume = None
print "[+] Opening {}".format(image)
if img_type == "ewf":
try:
filenames = pyewf.glob(image)
except IOError, e:
print "[-] Invalid EWF format:n {}".format(e)
sys.exit(2)
ewf_handle = pyewf.handle()
ewf_handle.open(filenames)
# Open PYTSK3 handle on EWF Image
img_info = ewf_Img_Info(ewf_handle)
else:
img_info = pytsk3.Img_Info(image)
Next, we attempt to access the volume of the image using the pytsk3.Volume_Info() method by supplying it the image handle. If the partition type argument was supplied, we add its attribute ID as the second argument. If we receive an IOError when attempting to access the volume, we catch the exception as e and print it to the console. Notice, however, that we do not exit the script as we often do when we receive an error. We'll explain why in the next function. Ultimately, we pass the volume, img_info, and output variables to the openFS() method.
try:
if part_type is not None:
attr_id = getattr(pytsk3, "TSK_VS_TYPE_" + part_type)
volume = pytsk3.Volume_Info(img_info, attr_id)
else:
volume = pytsk3.Volume_Info(img_info)
except IOError, e:
print "[-] Unable to read partition table:n {}".format(e)
openFS(volume, img_info, output)
The openFS() method tries to access the filesystem of the container in two ways. If the volume variable is not None, it iterates through each partition, and if that partition meets certain criteria, attempts to open it. If, however, the volume variable is None, it instead tries to directly call the pytsk3.FS_Info() method on the image handle, img. As we saw, this latter method will work and give us filesystem access for logical images whereas the former works for physical images. Let's look at the differences between these two methods.
Regardless of the method, we create a recursed_data list to hold our active file metadata. In the first instance, where we have a physical image, we iterate through each partition and check that is it greater than 2,048 sectors and does not contain the words "Unallocated", "Extended", or "Primary Table" in its description. For partitions meeting these criteria, we attempt to access its filesystem using the FS_Info() function by supplying the pytsk3 img object and the offset of the partition in bytes.
If we are able to access the filesystem, we use to open_dir() method to get the root directory and pass that, along with the partition address ID, the filesystem object, two empty lists, and an empty string, to the recurseFiles() method. These empty lists and string will come into play in recursive calls to this function as we will see shortly. Once the recurseFiles() method returns, we append the active file metadata to the recursed_data list. We repeat this process for each partition
def openFS(vol, img, output):
print "[+] Recursing through files.."
recursed_data = []
# Open FS and Recurse
if vol is not None:
for part in vol:
if part.len > 2048 and "Unallocated" not in part.desc and
"Extended" not in part.desc and "Primary Table" not in part.desc:
try:
fs = pytsk3.FS_Info(img,
offset=part.start*vol.info.block_size)
except IOError, e:
print "[-] Unable to open FS:n {}".format(e)
root = fs.open_dir(path="/")
data = recurseFiles(part.addr, fs, root, [], [], [""])
recursed_data.append(data)
We employ a similar method for the second instance, where we have a logical image, where the volume is None. In this case, we attempt to directly access the filesystem and, if successful, we pass that to the recurseFiles() method and append the returned data to our recursed_data list. Once we have our active file list, we send it and the user supplied output file path to the csvWriter() method. Let's dive into the recurseFiles() method which is the meat of this recipe.
else:
try:
fs = pytsk3.FS_Info(img)
except IOError, e:
print "[-] Unable to open FS:n {}".format(e)
root = fs.open_dir(path="/")
data = recurseFiles(1, fs, root, [], [], [""])
recursed_data.append(data)
csvWriter(recursed_data, output)
The recurseFiles() function is based on an example of the FLS tool (https://github.com/py4n6/pytsk/blob/master/examples/fls.py) and David Cowen's Automating DFIR series tool dfirwizard (https://github.com/dlcowen/dfirwizard/blob/master/dfirwiza rd-v9.py). To start this function, we append the root directory inode to the dirs list. This list is used later to avoid unending loops. Next, we begin to loop through each object in the root directory and check that it has certain attributes we would expect and that its name is not either "." or "..".
def recurseFiles(part, fs, root_dir, dirs, data, parent):
dirs.append(root_dir.info.fs_file.meta.addr)
for fs_object in root_dir:
# Skip ".", ".." or directory entries without a name.
if not hasattr(fs_object, "info") or not hasattr(fs_object.info,
"name") or not hasattr(fs_object.info.name, "name") or
fs_object.info.name.name in [".", ".."]:
continue
If the object passes that test, we extract its name using the info.name.name attribute. Next, we use the parent variable, which was supplied as one of the function's inputs, to manually create the file path for this object. There is no built-in method or attribute to do this automatically for us.
We then check if the file is a directory or not and set the f_type variable to the appropriate type. If the object is a file, and it has an extension, we extract it and store it in the file_ext variable. If we encounter an AttributeError when attempting to extract this data we continue onto the next object.
try:
file_name = fs_object.info.name.name
file_path = "{}/{}".format("/".join(parent),
fs_object.info.name.name)
try:
if fs_object.info.meta.type == pytsk3.TSK_FS_META_TYPE_DIR:
f_type = "DIR"
file_ext = ""
else:
f_type = "FILE"
if "." in file_name:
file_ext = file_name.rsplit(".")[-1].lower()
else:
file_ext = ""
except AttributeError:
continue
We create variables for the object size and timestamps. However, notice that we pass the dates to a convertTime() method. This function exists to convert the UNIX timestamps into a human-readable format. With these attributes extracted, we append them to the data list using the partition address ID to ensure we keep track of which partition the object is from
size = fs_object.info.meta.size
create = convertTime(fs_object.info.meta.crtime)
change = convertTime(fs_object.info.meta.ctime)
modify = convertTime(fs_object.info.meta.mtime)
data.append(["PARTITION {}".format(part), file_name, file_ext,
f_type, create, change, modify, size, file_path])
If the object is a directory, we need to recurse through it to access all of its sub-directories and files. To accomplish this, we append the directory name to the parent list. Then, we create a directory object using the as_directory() method. We use the inode here, which is for all intents and purposes a unique number and check that the inode is not already in the dirs list. If that were the case, then we would not process this directory as it would have already been processed.
If the directory needs to be processed, we call the recurseFiles() method on the new sub_directory and pass it current dirs, data, and parent variables. Once we have processed a given directory, we pop that directory from the parent list. Failing to do this will result in false file path details as all of the former directories will continue to be referenced in the path unless removed.
Most of this function was under a large try-except block. We pass on any IOError exception generated during this process. Once we have iterated through all of the subdirectories, we return the data list to the openFS() function.
if f_type == "DIR":
parent.append(fs_object.info.name.name)
sub_directory = fs_object.as_directory()
inode = fs_object.info.meta.addr
# This ensures that we don't recurse into a directory
# above the current level and thus avoid circular loops.
if inode not in dirs:
recurseFiles(part, fs, sub_directory, dirs, data,
parent)
parent.pop(-1)
except IOError:
pass
dirs.pop(-1)
return data
Let's briefly look at the convertTime() function. We've seen this type of function before, if the UNIX timestamp is not 0, we use the datetime.utcfromtimestamp() method to convert the timestamp into a human-readable format.
def convertTime(ts):
if str(ts) == "0":
return ""
return datetime.utcfromtimestamp(ts)
With the active file listing data in hand, we are now ready to write it to a CSV file using the csvWriter() method. If we did find data (i.e., the list is not empty), we open the output CSV file, write the headers, and loop through each list in the data variable. We use the csvwriterows() method to write each nested list structure to the CSV file.
def csvWriter(data, output):
if data == []:
print "[-] No output results to write"
sys.exit(3)
print "[+] Writing output to {}".format(output)
with open(output, "wb") as csvfile:
csv_writer = csv.writer(csvfile)
headers = ["Partition", "File", "File Ext", "File Type", "Create
Date", "Modify Date", "Change Date", "Size", "File Path"]
csv_writer.writerow(headers)
for result_list in data:
csv_writer.writerows(result_list)
The screenshot below demonstrates the type of data this recipe extracts from forensic images.
There's more...
For this recipe, there are a number of improvements that could further increase its utility:
Use tqdm, or another library, to create a progress bar to inform the user of the current execution progress.
Learn about the additional metadata values that can be extracted from filesystem objects using pytsk3 and add them to the output CSV file.
Summary
In summary, we have learned how to use pytsk3 to recursively iterate through any supported filesystem by the Sleuth Kit. This comprises the basis of how we can use the Sleuth Kit to programmatically process forensic acquisitions. With this recipe, we will now be able to further interact with these files in future recipes.
Resources for Article:
Further resources on this subject:
[article]
[article]
[article]
Read more