The value of a detection engineering program
Before a detection engineering program can be established, it must be justified to stakeholders in the organization so that funding can be received. This section will discuss the importance of detection engineering. Specifically, we will look at the increasing need for good detections, how we define the quality of a detection, and how a detection engineering program fills this need.
The need for better detection
Advancements in software development such as open source, cloud computing, infrastructure as Code (IaC), and continuous integrations/continuous deployment (CI/CD) pipelines have reaped benefits for organizations. These advancements allow organizations to easily build upon the technology of others, frequently deploy new versions of their software, quickly stand up and break down infrastructure, and adapt quickly to changes in their landscape.
Unfortunately, these same advancements have aided threat actors as well. Open source repositories provide a plethora of offensive tools. Cloud computing and IaC allow adversaries to quickly deploy and break down their C2 infrastructure, while advances in software processes and automation have increased their ops tempo in updating and creating new capabilities. These changes have further deteriorated the value of static indicators and necessitate the need for better, more sophisticated detections. As such, the field of detection engineering is beginning to evolve to support efforts for more sophisticated detections. With an effective detection engineering program, organizations can go beyond detecting static indicators and instead detect malicious activity at a technique level.
The qualities of good detection
There is no one definition for good detection. Individual cyber security organizations will have varying thresholds for false positive rates – that is, the rate of detections triggering when they shouldn’t. Additionally, the adversaries they face will differ in sophistication, and the visibility and tools at their disposal will vary. As a detection engineer, you must identify metrics and evaluation criteria that align with your organization’s needs. In Chapter 9, we will review processes and approaches that will help guide those decisions. These evaluation criteria can be broken into three areas:
- The ability to detect the adversary
- The cost of that ability to the cyber security organization
- The cost to the adversary to evade that detection
The ability to detect the adversary can be broken into a detection’s coverage, or the scope of the activity that the detection identifies. This can most easily be understood in terms of MITRE ATT&CK. As mentioned earlier, the framework provides definitions at varying levels of specificity, starting with tactics as the most general grouping, broken into techniques, and then procedures as the most fine-grained classification. Most behavioral detections focus on detecting one or more procedures taken by an adversary to implement a technique. Increasing a detection’s coverage by detecting multiple procedures associated with a technique or creating a detection that works across multiple techniques often increases the complexity of the detection but can also improve a detection’s durability.
Where a detection’s coverage can be thought of as the surface area across the MITRE ATT&CK TTPs, the durability of the detection identifies how long the detection is expected to be effective. Understanding the volatility of an adversary’s infrastructure, tools, and procedures and the relative cost to change them can help predict the durability of a detection.
These two evaluation criteria define what portion of attacks we can detect and for how long we expect those detections to be effective. Unfortunately, quantifying these evaluation criteria into metrics requires complete knowledge of an adversary’s capabilities and their ops tempo to change those capabilities. Despite this, we can use these criteria to rank the effectiveness and quality of our detections as we strive to improve our ability to detect the adversary.
However, we can calculate an organization’s historical effectiveness by calculating our mean time to detection as the time from the start of the attack on the organization to the time it took to detect the adversary.
Our ability to detect the adversary does not come without costs to the cyber security organization. These costs can be realized in the creation, running, and maintenance of detections, the resources spent reviewing associated alerts, and the actions taken based on those alerts. Later in this chapter, we will review the workflow of detection engineering. The creation time to perform that workflow defines the costs for creating that detection. For example, researching approaches to a technique is necessary to improve the coverage and durability of a detection but also increase the cost of creation. As a detection engineer, understanding the complexity of the detection affects future analysts’ abilities to understand and maintain the detection. It also affects the efficiency of running the detection (both positively and negatively). Maintaining the detections within an organization is an ongoing process. Staleness can be used to define the continued effectiveness or value of a detection. Is that technique or tool still being actively used? Is the detection used to detect something that is fully patched or protect infrastructure/software that is no longer on your network?
Each alert that an analyst must review comes at a cost. The confidence of a detection measures the probability that the alert is a true positive – that is, the alert is triggered under the expected conditions. However, tuning a detection to reduce the false positive rate can decrease the detection’s coverage and result in not identifying the attack. In contrast, the noisiness of a detection identifies how often a detection creates an alert that does not result in remediation. The noisiness of a detection might result from low confidence – that is, a high false positive rate – but it could also be related to the impact of the detection. Understanding the potential impact allows us to measure the importance or severity of what has been detected.
For example, a detection might identify reconnaissance scanning of the network. The lack of actionability on this activity, despite the confidence in the detection, might result in the noisiness of the detection being unacceptable. Each organization must identify its tolerance for false positives when tuning its detections. However, confidence in detection and the associated potential impact can be used to prioritize an organization’s alerts. In Chapter 5, we will review how low-fidelity detections can be valuable without significantly affecting analyst productivity.
The actionability of a detection defines how easy it is for a SOC analyst to leverage the detection to either further analyze the threat or remediate it. This does not mean that every detection must have an immediate action or response. A detection may have such significantly low confidence that it is ineffective to immediately investigate or respond to. Instead, the action associated with the alert is to increase confidence in other related identified activities or support potential root cause analysis. Unactionable intelligence, however, has limited value. The specificity of a detection supports this actionability by explaining what was detected. As an example, a machine learning model may provide increased coverage in detection with a high confidence level but may be unable to explain specifically why the alert was created. This lack of specificity, such as identifying the malware family, could reduce the actionability by not identifying the capabilities, persistence mechanisms, or other details about the malware required to properly triage or remediate the threat.
Lastly, when evaluating a detection, we must look at the cost to the adversary. While we will not, in most cases, have an inside look at the detailed costs associated with implementing an attack, we can look at indirect evidence in determining adversary cost. Inherent knowledge of how easily an adversary can evade detection, such as referencing the Pyramid of Pain, can provide guidance for ranking the cost to the adversary. As an example, the cost of changing a malware hash is significantly less than the cost of changing the malware’s C2 protocol. The volatility of an attacker’s infrastructure, tools, and procedures measures how often the attacker changes their attack in a way that would mitigate the detection. Identifying parts of an attack with lower volatility allows the defender to increase the durability of their detection.
The benefits of a detection engineering program
When selling the concept of a detection engineering program to executives, there’s only one justification that matters: a detection engineering program dramatically reduces the risk that a sophisticated adversary can penetrate their network and wreak havoc on their company. While this should be true about every aspect of your cyber security organization, each organization achieves this differently. A detection engineering program differs from other aspects of a cyber security program by allowing organizations to respond to new attacks quickly. It can leverage internal intelligence about adversaries targeting their industry and specifics about their company’s network to customize detections.
While detection solutions from any given vendor are typically bundled with vendor-provided detections, these detections are created based on a customer-agnostic approach to detection. They are written in such a way that they can be mass-distributed to client devices without impacting business. As such, vendor detections focus on rules and signatures that can apply to any environment. However, this does not catch the edge cases; that is where detection engineering within your organization comes in. By establishing a detection engineering program, you have control over the focus of your detections and can plan them in such a way that they cover the use cases specific to your environment. For example, the vendors cannot block all logins from foreign countries as that would impact their client base negatively. However, an internal detection engineering team can determine that there should not be any foreign logins and write the detection accordingly. We’ll dive deeper into designing detections tailored to your environment in Chapter 2 and Chapter 5.
In addition to this core benefit, there are additional secondary benefits that cyber security organizations can expect from a well-established detection engineering program. Specifically, we will dive into the following key advantages:
- Standardized and version-controlled detection code
- Automated testing
- Cost and time savings
Let’s take a look.
Standardized and version-controlled detection code
As part of building a detection engineering program, you will set the standards for how detections are written. This allows the code to be easily understood and compatible with detection solutions, regardless of the author of the detection. Without such standardization, the author will write rules at their discretion, potentially confusing peers trying to interpret the rule.
Furthermore, a detection repository will be leveraged so that all code is version-controlled and peer-reviewed and tested before it’s implemented in production. Maintaining a centralized repository of detection code reduces the chance of untested changes or rules being introduced into production environments and makes it easier to track any problematic code. We’ll discuss maintaining a repository of detections in Chapter 5.
Automated testing
By automating detection testing, we reduce the risk of new or modified detection code introducing errors into production environments. Furthermore, the more automation that is integrated into the environment, the less time detection engineers must spend manually testing code. The detection validation process will be thoroughly discussed in Part 3.
Cost and time savings
The cost and time savings of detection engineering are major selling points to stakeholders. For any funding provided to the program, stakeholders and management will look for a return on investment (ROI) as soon as possible. This ROI comes in the form of cost and time savings resulting from numerous factors. For example, automated testing will improve the quality of detections. This will reduce the time spent on testing detections and it will reduce the time the analysts spend responding to bad detections.
The largest cost and time savings deduction results from reducing the probability of a network breach. Reducing the risk of a breach by implementing well-developed detections reduces the risk of the cost associated with breaches.
In this section, we demonstrated the value of a detection engineering program and the benefits this has for an organization that implements such a program. The next section will close out this chapter by outlining the material you can expect to see throughout this book.