The incident response process
There are many industry standards, recommendations, and best practices that can help you to create your own incident response. You can still use those as a reference to make sure you cover all the relevant phases for your type of business. The one that we are going to use as a reference in this book is the computer security incident response (CSIR)—publication 800-61R2 from NIST. Regardless of the one you select to use as a reference, make sure to adapt it to your own business requirements. Most of the time, in security, the concept of “one size fits all” doesn’t apply; the intent is always to leverage well-known standards and best practices and apply them to your own context. It is important to retain the flexibility to accommodate your business needs in order to provide a better experience when operationalizing it.
While flexibility is key for adapting incident responses to suit individual needs and requirements, it is still invaluable to understand the commonalities between different responses. There are a number of reasons to have an IR process in place, and there are certain steps that will help with both creating an incident response process and putting together an effective incident response team. Additionally, every incident has an incident life cycle, which can be examined to better understand why the incident has occurred, and how to prevent similar issues in the future. We will discuss each of these in more depth to give you a deeper understanding of how to form your own incident response.
Reasons to have an IR process in place
Before we dive into more details about the process itself, it is important to be aware of the terminology that is used, and what the final goal is when using IR as part of enhancing your security posture. Let’s use a fictitious company to illustrate why this is important.
The following diagram has a timeline of events. These events lead the help desk to escalate the issue and start the incident response process:
Figure 2.1: Events timeline leading to escalation and the beginning of the incident response process
The following table has some considerations about each step in this scenario:
Step |
Description |
Security considerations |
1 |
While the diagram says that the system was working properly, it is important to learn from this event. |
What is considered normal? Do you have a baseline that can give you evidence that the system was running properly? Are you sure there is no evidence of compromise before the email? |
2 |
Phishing emails are still one of the most common methods used by cybercriminals to entice users to click on a link that leads to a malicious/compromised site. |
While technical security controls must be in place to detect and filter these types of attacks, users must be taught how to identify a phishing email. |
3 |
Many of the traditional sensors (IDS/IPS) used nowadays are not able to identify infiltration and lateral movement. |
To enhance your security posture, you will need to improve your technical security controls and reduce the gap between infection and detection. |
4 |
This is already part of the collateral damage done by this attack. Credentials were compromised, and the user was having trouble authenticating. This sometimes happens because the attackers already changed the user’s password. |
There should be technical security controls in place that enable IT to reset the user’s password and, at the same time, enforce multifactor authentication. |
5 |
Not every single incident is security-related; it is important for the help desk to perform their initial troubleshooting to isolate the issue. |
If the technical security controls in place (step 3) were able to identify the attack, or at least provide some evidence of suspicious activity, the help desk wouldn’t have to troubleshoot the issue—it could just directly follow the incident response process. |
6 |
At this point in time, the help desk is doing what it is supposed to do, collecting evidence that the system was compromised and escalating the issue. |
The help desk should obtain as much information as possible about the suspicious activity to justify the reason why they believe that this is a security-related incident. |
7 |
At this point, the IR process takes over and follows its own path, which may vary according to the company, industry segment, and standard. |
It is important to document every single step of the process and, after the incident is resolved, incorporate the lessons learned with the aim of enhancing the overall security posture. |
Table 2.1: Security considerations for different steps in an events timeline
While there is much room for improvement in the previous scenario, there is something that exists in this fictitious company that many other companies around the world are missing: the incident response itself. If it were not for the incident response process in place, support professionals would exhaust their troubleshooting efforts by focusing on infrastructure-related issues. Companies that have a good security posture would have an incident response process in place.
They would also ensure that the following guidelines are adhered to:
- All IT personnel should be trained to know how to handle a security incident.
- All users should be trained to know the core fundamentals of security in order to perform their job more safely, which will help avoid getting infected.
- There should be integration between their help desk system and the incident response team for data sharing.
This scenario could have some variations that could introduce different challenges to overcome. One variation would be if no indicator of compromise (IoC) was found in step 6. In this case, the help desk could easily continue troubleshooting the issue. What if at some point “things” started to work normally again? Is this even possible? Yes, it is! When an IoC is not found it doesn’t mean the environment is clean; now you need to switch gears and start looking for an indicator of attack (IoA), which involves looking for evidence that can show the intent of an attacker. When investigating a case, you may find many IoAs, which may or may not lead to an IoC. The point is, understanding the IoA will lead you to better understand how an attack was executed, and how you can protect against it.
When an attacker infiltrates the network, they usually want to stay invisible, moving laterally from one host to another, compromising multiple systems, and trying to escalate privileges by compromising an account with administrative-level privileges. That’s the reason why it is so important to have good sensors not only in the network but also in the host itself. With good sensors in place, you would be able to not only detect the attack quickly but also identify potential scenarios that could lead to an imminent threat of violation.
In addition to all the factors that were just mentioned, some companies will soon realize that they must have an incident response process in place to be compliant with regulations that are applicable to the industry in which they belong. For example, the Federal Information Security Management Act (FISMA) of 2002 requires federal agencies to have procedures in place to detect, report, and respond to a security incident.
Creating an incident response process
Although the incident response process will vary according to the company and its needs, there are some fundamental aspects of it that will be the same across different industries.
The following diagram shows the foundational areas of the incident response process:
Figure 2.2: The incident response process and its foundational areas of Objective, Scope, Definition/Terminology, Roles and responsibilities, and Priorities/Severity Level
The first step to create your incident response process is to establish the objective—in other words, to answer the question: what’s the purpose of this process? While this might appear redundant as the name seems to be self-explanatory, it is important that you are very clear as to the purpose of the process so that everyone is aware of what this process is trying to accomplish.
Once you have the objective defined, you need to work on the scope. Again, you start this by answering a question, which in this case is: To whom does this process apply?
Although the incident response process usually has a company-wide scope, it can also have a departmental scope in some scenarios. For this reason, it is important that you define whether this is a company-wide process or not.
Each company may have a different perception of a security incident; therefore, it is imperative that you have a definition of what constitutes a security incident, with examples for reference.
Along with the definition, companies must create their own glossary with definitions of the terminology used. Different industries will have different sets of terminologies, and if these terminologies are relevant to a security incident, they must be documented.
In an incident response process, the roles and responsibilities are critical. Without the proper level of authority, the entire process is at risk. The importance of the level of authority in an incident response is evident when you consider the question: Who has the authority to confiscate a computer in order to perform further investigation? By defining the users or groups that have this level of authority, you are ensuring that the entire company is aware of this, and if an incident occurs, they will not question the group that is enforcing the policy.
Another important question to answer is regarding the severity of an incident. What defines a critical incident? The criticality will lead to resource distribution, which brings another question: How are you going to distribute your manpower when an incident occurs? Should you allocate more resources to incident “A” or to incident “B”? Why? These are only some examples of questions that should be answered in order to define the priorities and severity level. To determine the priorities and severity level, you will need to also take into consideration the following aspects of the business:
- Functional impact of the incident on the business: The importance of the affected system for the business will have a direct effect on the incident’s priority. All stakeholders for the affected system should be aware of the issue and will have their input in the determination of priorities.
- Type of information affected by the incident: Every time you deal with personally identifiable information (PII), your incident will have high priority; therefore, this is one of the first elements to verify during an incident. Another factor that can influence the severity is the type of data that was compromised based on the compliance standard your company is using. For example, if your company needs to be HIPAA compliant, you would need to raise the severity level if the data compromised was governed by the HIPAA standards.
- Recoverability: After the initial assessment, it is possible to give an estimate of how long it will take to recover from an incident. Depending on the amount of time to recover, combined with the criticality of the system, this could drive the priority of the incident to high severity.
In addition to these fundamental areas, an incident response process also needs to define how it will interact with third parties, partners, and customers.
For example, if an incident occurs and during the investigation process it is identified that a customer’s PII was leaked, how will the company communicate this to the media? In the incident response process, communication with the media should be aligned with the company’s security policy for data disclosure. The legal department should also be involved prior to the press release to ensure that there is no legal issue with the statement. Procedures to engage law enforcement must also be documented in the incident response process. When documenting this, take into consideration the physical location—where the incident took place, where the server is located (if appropriate), and the state. By collecting this information, it will be easier to identify the jurisdiction and avoid conflicts.
Incident response team
Now that you have the fundamental areas covered, you need to put the incident response team together. The format of the team will vary according to the company size, budget, and purpose. A large company may want to use a distributed model, where there are multiple incident response teams with each one having specific attributes and responsibilities. This model can be very useful for organizations that are geo-dispersed, with computing resources located in multiple areas. Other companies may want to centralize the entire incident response team in a single entity. This team will handle incidents regardless of the location. After choosing the model that will be used, the company will start recruiting employees to be part of the team.
The incident response process requires personnel with technically broad knowledge while also requiring deep knowledge in some other areas. The challenge is to find people with depth and breadth in this area, which sometimes leads to the conclusion that you need to hire external people to fill some positions, or even outsource part of the incident response team to a different company.
The budget for the incident response team must also cover continuous improvement via education, and the acquisition of proper tools, software, and hardware. As new threats arise, security professionals working with incident response must be ready and trained to respond well. Many companies fail to keep their workforce up to date, which may expose the company to risk. When outsourcing the incident response process, make sure the company that you are hiring is accountable for constantly training their employees in this field.
If you plan to outsource your incident response operations, make sure you have a well-defined service-level agreement (SLA) that meets the severity levels that were established previously. During this phase, you should also define the team coverage, assuming the need for 24-hour operations.
In this phase you will define:
- Shifts: How many shifts will be necessary for 24-hour coverage?
- Team allocation: Based on these shifts, who is going to work on each shift, including full-time employees and contractors?
- On-call process: It is recommended that you have on-call rotation for technical and management roles in case the issue needs to be escalated.
Defining these areas during this phase is particularly useful as it will allow you to more clearly see the work that the team needs to cover, and thus allocate time and resources accordingly.
Incident life cycle
Every incident that starts must have an end, and what happens in between the beginning and the end are different phases that will determine the outcome of the response process. This is an ongoing process that we call the incident life cycle. What we have described so far can be considered the preparation phase. However, this phase is broader than that—it also has the partial implementation of security controls that were created based on the initial risk assessment (this was supposedly done even before creating the incident response process).
Also included in the preparation phase is the implementation of other security controls, such as:
- Endpoint protection
- Malware protection
- Network security
The preparation phase is not static, and you can see in the following diagram that this phase will receive input from post-incident activity. The post-incident activity is critical to improve the level of preparation for future attacks, because here is where you will perform a postmortem analysis to understand the root cause and see how you can improve your defense to avoid the same type of attack happening in the future. The other phases of the life cycle and how they interact are also shown in this diagram:
Figure 2.3: Phases of the incident life cycle
The detection and containment phases could have multiple interactions within the same incident. Once the loop is over, you will move on to the post-incident activity phase. The sections that follow will cover these last three phases in more detail.