Hashing is a critical component of the DFIR workflow. While most use cases of hashing are focused on integrity checking, the use of similarity analysis allows us to learn more about near matches and file relations. This process can provide insight for malware detection, identification of restricted documents in unauthorized locations, and discovery of closely related items based on content only. Through the use of third-party libraries, we're able to lean on the power behind the C languages with the flexibility of the Python interpreter and build powerful tools that are user and developer friendly. The code for this project can be downloaded from GitHub or Packt, as described in the Preface.
A fuzzy hash is a form of metadata, or data about data. Metadata also includes embedded attributes such as document editing time, image geolocation information, and source application...