Hashing data is a common technique in the forensics community to fingerprint a file. Normally, we create a hash of an entire file; however, in the script we'll build later in this chapter, we'll hash segments of a file to evaluate the similarity between two files. Before diving into the complexities of fuzzy hashing, let's walk through how Python can generate cryptographic hashes such as MD5 and SHA1 values.
Background on hashing
Hashing files in Python
As previously discussed, there are multiple algorithms commonly used by the DFIR community and tools. Before generating a file hash, we must decide which algorithm we would like to use. This can be a tough question, as there are multiple factors to consider....