A cloud was born
Shortly after, between 2013 and 2015, the Windows Defender team started using the Windows telemetry collection pipeline to start streaming Defender AV telemetry. Soon after, they added telemetry from SCEP and MSRT (which, by then, were deployed on over a billion devices) to a data lake. This data lake was hosted on what can be considered an internal cloud (a precursor of Microsoft Azure) alongside Bing telemetry, and the raw telemetry was cooked to generate processed entity profiles including file, process, and network. This enabled querying vast volumes of data to identify all occurrences of a given entity in a performant manner. The team also applied a real-time streaming analytics engine called Stream Insights to the incoming telemetry. This allowed them to perform real-time malware detection, creating one of the foundations for what is now called cloud-delivered protection—a major milestone in the evolution of Defender Antivirus to a true machine learning (ML)-powered, next-generation endpoint protection solution.
Around 2015, cloud operations for the product were moved to Microsoft’s ILDC, where today, Sense, the endpoint detection sensor in the Microsoft Defender for Endpoint (MDE) product is developed. Before Sense, SCEP could, in fact, act as an endpoint detection and response (EDR) sensor, but required very aggressive cloud communication. Though this resulted in a heavyweight solution due to having to scan before sending telemetry, it allowed Microsoft to develop the backend for Sense mentioned previously.
Cold snack
Profiles, or event types, introduced through the data lake effort can be found today inside MDE. As an early adopter of Microsoft’s Cosmos NoSQL database, Defender Antivirus’s data lake efforts greatly stimulated the development of EDR until its official release in 2017—it remains in use today to continue to support the staggering worldwide scale needed to protect hundreds of millions of machines. In fact, billions of requests are served daily, likely making the Defender cloud the largest-scale security solution on the planet today.
One of the key goals of establishing a data lake was to provide the ability to perform behavioral analysis to deal with malware that was specifically designed to avoid detection; emulation, a technique to simulate execution, can only go so far in collecting the signals needed to come to a verdict. A way to detect malware that was designed with obfuscation in mind was needed, which shifted the focus to the execution phase into post-breach, away from physical attributes and toward behavioral detection.
The telemetry gathered in the data lake was augmented to include process information from the antivirus, and events from Event Tracing for Windows (ETW), to create profiles for files, network connections, and processes. Then, these were matched against indicators of attack (IoAs).
Cold snack
Microsoft’s security operations center (SOC), the Cyber Defense Operations Center (CDOC), was one of the earliest adopters of what was then called the IOC Storyboard, an Excel file that allowed them to leverage the telemetry to perform pivoting across entities/profiles, and hunt across the data. This extremely popular workbook was quickly adopted by other blue teams inside Microsoft. Today, Microsoft’s digital security division, covering everything from internal IT to security for customer-facing services such as Azure and Office 365, remains one of the biggest users of MDE and is a heavy driver of further product development.