During your malware investigation, when you come across a malware sample, you may want to know whether the malware sample belongs to a particular malware family or if it has characteristics that match with the previously analyzed samples. Comparing the suspect binary with previously analyzed samples or the samples stored in a public or private repository can give an understanding of the malware family, its characteristics, and the similarity with the previously analyzed samples.
While cryptographic hashing (MD5/SHA1/SHA256) is a great technique to detect identical samples, it does not help in identifying similar samples. Very often, malware authors change minute aspects of malware, which changes the hash value completely. The following sections describe some of the techniques that can help in comparing and classifying the suspect binary...