This is probably the most well-known file format that can be found in documents associated with various older and newer Microsoft Office products, such as .doc (Microsoft Office), .xls (Microsoft Excel), .ppt (Microsoft PowerPoint), and others. Once completely proprietary, it was later released to the public and now its specification can be found online. Let's go through some of the most important parts of it in terms of malware analysis.
The Compound File Binary (CFB) format provides a filesystem-like structure for storing application-specific streams of data. Here is its header structure according to the official documentation:
- Header signature (8 bytes): Magic value, always \xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1 (where the first 4 bytes in hex resemble a DOCFILE string)
- Header CLSID (16 bytes): Unused class ID, must be zero
- Minor version (2 bytes): Always 0x003E for major versions 3 and 4
- Major version (2 bytes): Main version number, can be either 0x0003...