Handling an incident
Handling an incident in the context of the IR life cycle includes the detection and containment phases.
In order to detect a threat, your detection system must be aware of the attack vectors, and since the threat landscape changes so rapidly, the detection system must be able to dynamically learn more about new threats and new behaviors and trigger an alert if suspicious activity is encountered.
While many attacks will be automatically detected by the detection system, the end user has an important role in identifying and reporting the issue if they find suspicious activity.
For this reason, the end user should also be aware of the different types of attacks and learn how to manually create an incident ticket to address such behaviors. This is something that should be part of the security awareness training.
Even with users being diligent by closely watching for suspicious activities, and with sensors configured to send alerts when an attempt to compromise is detected, the most challenging part of an IR process is still the accuracy of detecting what is truly a security incident.
Oftentimes, you will need to manually gather information from different sources to see if the alert that you received really reflects an attempt to exploit a vulnerability in the system. Keep in mind that data gathering must be done in compliance with the company’s policy. In scenarios where you need to bring the data to a court of law, you need to guarantee the data’s integrity.
The following diagram shows an example where the combination and correlation of multiple logs is necessary in order to identify the attacker’s ultimate intent:
Figure 2.4: The necessity of multiple logs in identifying an attacker’s ultimate intent
In this example, we have many IoCs, and when we put all the pieces together, we can validate the attack. Keep in mind that depending on the level of information that you are collecting in each one of those phases, and how conclusive it is, you may not have evidence of compromise, but you will have evidence of an attack, which is the IoA for this case.
The following table explains the diagram in more detail, assuming that there is enough evidence to determine that the system was compromised:
Step |
Log |
Attack/Operation |
1 |
Endpoint protection and operating system logs can help determine the IoC |
Phishing email |
2 |
Endpoint protection and operating system logs can help determine the IoC |
Lateral movement followed by privilege escalation |
3 |
Server logs and network captures can help determine the IoC |
Unauthorized or malicious processes could read or modify the data |
4 |
Assuming there is a firewall in between the cloud and on-premises resources, the firewall log and the network capture can help determine the IoC |
Data extraction and submission to command and control |
Table 2.2: Logs used to identify the attacks/operations of a threat actor
As you can see, there are many security controls in place that can help to determine the indication of compromise. However, putting them all together in an attack timeline and cross-referencing the data can be even more powerful.
This brings back a topic that we discussed in the previous chapter: that detection is becoming one of the most important security controls for a company. Sensors that are located across the network (on-premises and in the cloud) will play a big role in identifying suspicious activity and raising alerts. A growing trend in cybersecurity is the leveraging of security intelligence and advanced analytics to detect threats more quickly and reduce false positives. This can save time and enhance the overall accuracy.
Ideally, the monitoring system will be integrated with the sensors to allow you to visualize all events on a single dashboard. This might not be the case if you are using different platforms that don’t allow interaction between one another.
In a scenario like the one presented in Figure 2.4, the integration between the detection and monitoring system can help to connect the dots of multiple malicious actions that were performed in order to achieve the final mission—data extraction and submission to command and control.
Once the incident is detected and confirmed as a true positive, you need to either collect more data or analyze what you already have. If this is an ongoing issue, where the attack is taking place at that exact moment, you need to obtain live data from the attack and rapidly provide remediation to stop the attack. For this reason, detection and analysis are sometimes done almost in parallel to save time, and this time is then used to rapidly respond.
The biggest problem arises when you don’t have enough evidence that there is a security incident taking place, and you need to keep capturing data in order to validate its veracity. Sometimes the incident is not detected by the detection system. Perhaps it is reported by an end user, but they can’t reproduce the issue at that exact moment. There is no tangible data to analyze, and the issue is not happening at the time you arrive. In scenarios like this, you will need to set up the environment to capture data, and instruct the user to contact support when the issue is actually happening.
You can’t determine what’s abnormal if you don’t know what’s normal. In other words, if a user opens a new incident saying that the server’s performance is slow, you must know all the variables before you jump to a conclusion. To know if the server is slow, you must first know what’s considered to be a normal speed. This also applies to networks, appliances, and other devices. In order to establish this understanding, make sure you have the following in place:
- System profile
- Network profile/baseline
- Log-retention policy
- Clock synchronization across all systems
Based on this, you will be able to establish what’s normal across all systems and networks. This will be very useful when an incident occurs, and you need to determine what’s normal before starting to troubleshoot the issue from a security perspective.
Incident handling checklist
Many times, the “simple” makes a big difference when it comes time to determine what to do now and what to do next. That’s why having a simple checklist to go through is very important to keep everyone on the same page. The list below is not definitive; it is only a suggestion that you can use as a foundation to build your own checklist:
- Determine if an incident has actually occurred and start the investigation:
1.1 Analyze the data and potential indicators (IoA and IoC).
1.2 Review potential correlation with other data sources.
1.3 Once you determine that the incident has occurred, document your findings and prioritize the handling of the incident based on the criticality of the incident. Take into consideration the impact and the recoverability effort.
1.4 Report the incident to the appropriate channels.
- Make sure you gather and preserve evidence.
- Perform incident containment.
3.1 Examples of incident containment include:
3.1.1 Quarantining the affected resource
3.1.2 Resetting the password for the compromised credential
- Eradicate the incident using the following steps:
4.1 Ensure that all vulnerabilities that were exploited are mitigated.
4.2 Remove any malware from the compromised system and evaluate the level of trustworthiness of that system. In some cases, it will be necessary to fully reformat the system, as you may not be able to trust that system anymore.
- Recover from the incident.
5.1 There might be multiple steps to recover from an incident, mainly because it depends on the incident. Generally speaking, the steps here may include:
5.1.1 Restoring files from backup
5.1.2 Ensuring that all affected systems are fully functional again
- Perform a post-incident analysis.
6.1 Create a follow-up report with all lessons learned
6.2 Ensure that you are implementing actions to enhance your security posture based on those lessons learned
As mentioned previously, this list is not exhaustive, and these steps should be tailored to suit specific needs. However, this checklist provides a solid baseline to build on for your own incident response requirements.