Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learning Python for Forensics

You're reading from   Learning Python for Forensics Leverage the power of Python in forensic investigations

Arrow left icon
Product type Paperback
Published in Jan 2019
Publisher Packt
ISBN-13 9781789341690
Length 476 pages
Edition 2nd Edition
Languages
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Preston Miller Preston Miller
Author Profile Icon Preston Miller
Preston Miller
Chapin Bryce Chapin Bryce
Author Profile Icon Chapin Bryce
Chapin Bryce
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Now for Something Completely Different FREE CHAPTER 2. Python Fundamentals 3. Parsing Text Files 4. Working with Serialized Data Structures 5. Databases in Python 6. Extracting Artifacts from Binary Files 7. Fuzzy Hashing 8. The Media Age 9. Uncovering Time 10. Rapidly Triaging Systems 11. Parsing Outlook PST Containers 12. Recovering Transient Database Records 13. Coming Full Circle 14. Other Books You May Enjoy

When to use Python

Python is a powerful forensic tool. However, before deciding to develop a script, it is important to consider the type of analysis that's required and the project timeline. In the examples that follow, we will outline situations where Python is invaluable and, conversely, when it is not worth the development effort. Though rapid development makes it easy to deploy a solution in a tough situation, Python is not always the best tool to implement. If a tool exists that performs the task at hand, and is available, it may be the more appropriate method for analysis.

Python is a preferred programming language for forensics due to its ease of use, library support, detailed documentation, and interoperability among operating systems. There are two main types of programming languages: those that are interpreted and those that are compiled. Compiling code allows the programming language to be converted into machine language. This lower-level language is more efficient for the computer to interpret. Interpreted languages are not as fast as compiled languages at runtime, but do not require compilation, which can take some time. Because Python is an interpreted language, we can make modifications to our code and immediately run and view the results. With a compiled language, we would have to wait for our code to re-compile before viewing the effect of our modifications. For this reason, Python may not run as quickly as a compiled language, but allows for rapid prototyping.

An incident response case presents an excellent example of when to use Python in a real-life setting. For example, let's consider that a client calls, panicked, reporting a data breach and is unsure of how many files were exfiltrated over the past 24 hours from their file server. Once on site, you are instructed to perform the fastest count of files accessed in the past 24 hours as this count, and the list of compromised files, will determine the course of action.

Python fits this bill quite nicely here. Armed with just a laptop, you can open a text editor and begin writing a solution. Python can be built and designed without the need for a fancy editor or toolset. The build process of your script may look like this, with each step building upon the previous one:

  1. Make the script read a single file's last accessed timestamp
  2. Write a loop that steps through directories and subdirectories
  3. Test each file to see if that timestamp is from the past 24 hours
  4. If it has been accessed within 24 hours, then create a list of affected files to display file paths and access times

The process here would result in a script that recurses over the entire server and output files found with a last accessed time in the past 24 hours for manual review. This script will likely be approximately 20 lines of code and have required 10 minutes, or less, for an intermediate scripter to develop and validate—it is apparent this would be more efficient than manually reviewing timestamps on the filesystem.

Before deploying any developed code, it is imperative that you validate its capability first. As Python is not a compiled language, we can easily run the script after adding new lines of code to ensure we haven't broken anything. This approach is known as test-then-code, a method commonly used in script development. Any software, regardless of who wrote it, should be scrutinized and evaluated to ensure accuracy and precision. Validation ensures that the code is operating properly, and although more time-consuming, provides reliable results that are capable of withstanding the courtroom, an important aspect in forensics.

A situation where Python may not be the best tool is for general case analysis. If you are handed a hard drive and asked to find evidence without additional insight, then a pre-existing tool will be the better solution. Python is invaluable for targeted solutions, such as analyzing a given file type and creating a metadata report. Developing a custom all-in-one solution for a given filesystem requires too much time to create when other tools, both paid and free, exist that support such generic analysis.

Python is useful in pre-processing automation. If you find yourself repeating the same tasks for each piece of evidence, it may be worthwhile to develop a system that automates those steps. A great example of suites that perform such analysis is ManTech's analysis and triage system (mantaray: http://github.com/mantarayforensics), which leverages a series of tools to create general reports that can speed up analysis when there is no scope of what data may exist.

When considering whether to commit resources to develop Python scripts, either on the fly or for larger projects, it is important to consider what solutions already exist, the time available to create a solution, and the time saved through automation. Despite best intentions, the development of solutions can go on for much longer than initially conceived without a strong design plan.

Development life cycle

The development cycle involves at least five steps:

  • Identify
  • Plan
  • Program
  • Validate
  • Bugs

The first step is self-explanatory; before you develop, you must identify the problem that needs to be solved. Planning is perhaps the most crucial step in the development cycle:

Good planning will help later by decreasing the amount of code required and the number of bugs. Planning becomes even more vital during the learning process. A forensic programmer must begin to answer the following questions: how will data be ingested, what Python data types are most appropriate, are third-party libraries necessary, and how will the results be displayed to the examiner? In the beginning, just as if we were writing a term paper, it is a good idea to write, or draw, an outline of your program. As you become more proficient in Python, planning will become second nature, but initially, it is recommended to create an outline or write pseudocode.

Pseudocode is an informal way of writing code before filling in the details with actual code. Pseudocode can represent the bare bones of the program, such as defining pertinent variables and functions while describing how they will all fit together within the script's framework. Pseudocode for a function might look like this:

# open the database
# read from the database using the sqlite3 library
# store in variable called records
for record in records:
# process database records here

After identifying and planning, the next three steps make up the largest part of the development cycle. Once your program has been sufficiently planned, it is time to start writing code! Once the code is written, break in your new program with as much test data as possible. Especially in forensics, it is critical to thoroughly test your code instead of relying on the results of one example. Without comprehensive debugging, the code can crash when it encounters something unexpected, or, even worse, it could provide the examiner with false information and lead them down the wrong path. After the code has been tested, it is time to release it and prepare for bug reports. We are not talking about insects here! Despite a programmer's best efforts, there will always be bugs in the code. Bugs have a nasty way of multiplying even as you squash one, perpetually causing the programming cycle to begin repeatedly.

You have been reading a chapter from
Learning Python for Forensics - Second Edition
Published in: Jan 2019
Publisher: Packt
ISBN-13: 9781789341690
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime