The collection process
Once the IR have been defined, we can proceed with collecting the raw data we need to fulfill them. For this process, we can consult two types of sources: internal sources (such as networks and endpoints) and external sources (such as blogs, threat intelligence feeds, threat reports, public databases, forums, and so on).
The most effective way to carry on the collection process is to use a collection management framework (CMF). Using a CMF allows you to identify data sources and easily track the type of information you are gathering for each. It can also be of use to rate the data that's been obtained from the source, including how long that data has been stored and to track how trustworthy and complete the source is. It is advised that you use the CMF to track not only the external sources, but also the internal ones. Here's an example of what one would look like:
Dragos analysts Lee, Miller, and Stacey wrote an interesting paper (https://dragos.com/wp-content/uploads/CMF_For_ICS.pdf?hsCtaTracking=1b2b0c29-2196-4ebd-a68c-5099dea41ff6|27c19e1c-0374-490d-92f9-b9dcf071f9b5) about using a CMF to explore different methodologies and examples. Another great resource available that can be used to design an advanced collection process is the Collection Management Implementation Framework (https://studylib.net/doc/13115770/collection-management-implementation-framework-what-does-...), designed by the Software Engineering Institute.
Indicators of compromise
So far, we've talked about finding the IR and how to use a CMF. But what data are we going to collect?
An indicator of compromise (IOC), as the name suggests, is an artifact that's been observed in a network or in an operating system that, with high confidence, indicates that it has been compromised. This forensic data is used to understand what happened, but if collected properly, it can also be used to prevent or detect ongoing breaches.
Typical IOCs may include hashes of malicious files, URLs, domains, IPs, paths, filenames, Registry keys, and malware files themselves.
It is important to remember that, in order to be really useful, it is necessary to provide context for the IOCs that have been collected. Here, we can follow the mantra quality over quantity – a huge amount of IOCs does not always mean better data.
Understanding malware
Malware, short for malicious software, is not everything, but it can be an incredibly valuable source of information. Before we look at the different types of malware, it is important for us to understand how malware typically works. Here, we need to introduce two concepts: the dropper and the Command and Control (C2 or C2C).
A dropper is a special type of software designed to install a piece of malware. We will sometimes talk about single-staged and two-stage droppers, depending on whether or not the malware code is contained in the dropper. When the malicious code is not contained within the dropper, it will be downloaded to the victim's device from an external source. Some security researchers may call this two-stage type of dropper a downloader, while referring to a two-stage dropper as the one that requires further steps to put different pieces of code together (by decompressing or executing different pieces of code) to build a final piece of malware.
The Command and Control (C2) is an attacker-controlled computer server that's used to send commands to the malware running in the victim's systems. It's the way the malware communicates with its "owner." There are multiple ways that a C2 can be established and, depending on the malware's capabilities, the complexity of the commands and the communication that can be established may vary. For example, threat actors have been seen using cloud-based services, emails, blog comments, GitHub repositories, and DNS queries, among other things, for C2 communication.
There are different types of malware according to their capabilities, and sometimes, one malware piece can be classified as more than one type. The following is a list of the most common ones:
- Worm: An autonomous program capable of replicating and propagating itself through the network.
- Trojan: A program that appears to serve a designated purpose, but also has a hidden malicious capability to bypass security mechanisms, thus abusing the authorization that's been given to it.
- Rootkit: A set of software tools with administrator privileges, designed to hide the presence of other tools and hide their activities.
- Ransomware: A computer program designed to deny access to a system or its information until a ransom has been paid.
- Keylogger: Software or hardware that records keyboard events without the user's knowledge.
- Adware: Malware that offers the user specific advertising.
- Spyware: Software that has been installed onto a system without the knowledge of the owner or the user, with the intention of gathering information about him/her and monitoring his/her activity.
- Scareware: Malware that tricks computer users into visiting compromised websites.
- Backdoor: The method by which someone can obtain administrator user access in a computer system, a network, or a software application.
- Wiper: Malware that erases the hard drive of the computer it infects.
- Exploit kit: A package that's used to manage a collection of exploits that could use malware as a payload. When a victim visits a compromised website, it evaluates the vulnerabilities in the victim's system in order to exploit certain vulnerabilities.
A malware family references a group of malicious software with common characteristics and, most likely, the same author. Sometimes, a malware family can be directly related to a specific threat actor. Sometimes, malware (or a tool) is shared among different groups. This happens a lot with open source malware tools that are publicly available. Leveraging them helps the adversary disguise its identity.
Now let's take a quick look to how we can collect data around pieces of malware.
Using public sources for collection – OSINT
Open Source Intelligence (OSINT) is the process of collecting publicly available data. The most common sources that come to mind when talking about OSINT are social media, blogs, news, and the dark web. Essentially, any data that's made publicly available can be used for OSINT purposes.
Important Note
There are many great resources for someone looking to start collecting information: VirusTotal (https://www.virustotal.com/), CCSS Forum (https://www.ccssforum.org/), and URLHaus (https://urlhaus.abuse.ch/) are great places to get started with the collection process.
Also, take a look at OSINTCurio.us (https://osintcurio.us/) to learn more about OSINT resources and techniques.
Honeypots
A honeypot is a decoy system that imitates possible targets of attacks. A honeypot can be set up to detect, deflect, or counteract an attacker. All traffic that's received is considered malicious and every interaction with the honeypot can be used to study the attacker's techniques.
There are many types of honeypots (an interesting list can be found here: https://hack2interesting.com/honeypots-lets-collect-it-all/), but they are mostly divided into three categories: low interaction, medium interaction, and high interaction.
Low interaction honeypots simulate the transport layer and provide very limited access to the operating system. Medium interaction honeypots simulate the application layer in order to lure the attacker into sending the payload. Finally, high interaction honeypots usually involve real operating systems and applications. These ones are better for uncovering the abuse of unknown vulnerabilities.
Malware analysis and sandboxing
Malware analysis is the process of studying the functionality of malicious software. Typically, we can distinguish between two types of malware analysis: dynamic and static.
Static malware analysis refers to analyzing the software that's used without executing it. Reverse engineering or reversing is a form of static malware analysis and is performed using a disassembler such as IDA or the more recent NSA tool, Ghidra, among others.
Dynamic malware analysis is performed by observing the behavior of the malware piece once it's been executed. This type of analysis is usually performed in a controlled environment to avoid infecting production systems.
In the context of malware analysis, a sandbox is an isolated and controlled environment used to dynamically analyze pieces of malware automatically. In a sandbox, the suspected malware piece is executed and its behavior is recorded.
Of course, things are not always this simple, and malware developers implement techniques to prevent the malware from being sandboxed. At the same time, security researchers develop their own techniques to bypass the threat actor's antisandbox techniques. Despite this chase of cat and mouse, sandboxing systems are still a crucial part of the malware analysis process.
Tip
There are some great online sandboxing solutions, such as Any Run (any.run) and Hybrid Analysis (https://www.hybrid-analysis.com/). Cuckoo Sandbox (https://cuckoosandbox.org/) is an open source and offline sandboxing system for Windows, Linux, macOS, and Android.