In this section, we will focus on defense-specific planning, skills, tooling, and infrastructure. A lot of these tools may be used in one-off analysis tasks or in conjunction with other tools to achieve a larger team goal. We will see how spending time preparing cooperative infrastructure during the planning and preparation phases before an engagement will save us precious time during live operations. As Leo Tolstoy said, "The two most powerful warriors are patience and time." I interpret this as: if we use our time wisely, patiently setting up defensive systems, we will be far more powerful when we encounter the opponent. I have heard defense referred to as a series of web building, analogous to a spider building its web. Following this analogy, the net must be wide enough to cover all of the space they are tasked with protecting, but also flexible enough to alert them when the net has caught something. While it takes time for the spider to build their net, the result is a greatly improved ability to catch its prey. Still, the net must be maintained, and thus requires expertise and resources to make it a viable strategy. Further bolstering the idea that preparation is key, it is important to remember that the offense only needs to compromise its target once to get a foothold in the network. For the defense to be successful it must defend against 100% of the attacks on it. As we know, this is nigh on impossible, so we invest in preparing response processes to ensure that when we are inevitably compromised, we can identify the threat, then contain and eradicate it effectively. We can elaborate on this by creating a network of machines that identify compromise, buffering the systems we are protecting. This is a callback to the concept of defense in depth we highlighted in the last chapter. If the breach of a single system is near impossible to prevent, by creating a network of hardened systems we can detect the offense as they pivot through the network toward their goal. By invoking multiple defensive technologies in our strategy, we greatly increase the likelihood we will detect the offense at various stages in their attack chain. During planning, it is important to prioritize the infrastructure within the strategy tailored to your group's needs. Knowing that we may lose critical infrastructure at some time during an event, we should keep in mind alternative options in the event this occurs. This is a critical part of the contingency planning mentioned earlier, and in corporate parlance would be part of our business continuity planning strategy. In line with best practice, we should also use these alternative tools and methods to verify our primary tool's results to ensure that these tools aren't being deceived. It is common for an attacker to backdoor a system or deploy deceptive techniques to alter forensic tooling output, with the intent of confusing defenders.
It is well understood that the best investment a defensive team can make early on is in security log generation, aggregation, and alerting. In order to achieve these capabilities, we must generate logs from all critical systems and store them somewhere centrally.
Security logs can later be reviewed during a significant event, alerted on, and utilized for forensic reconstruction. Security collectors or agents are typically used to generate data from active infrastructure. I've always bucketed digital security collection into three categories: network-based telemetry, host-based telemetry, and application-specific or log-based telemetry. We will be considering all three in this book as they each have different strengths and weaknesses. We will use various agents to aggregate information from these sources into a central location for analysts to triage. For example, network monitoring can be helpful for identifying unknown devices operating in your network whereas application-specific logs could reveal detailed protocol information showing fraud or abuse in an application. In competitions, I like to prioritize network-based visibility, then host-based, and lastly application-specific telemetry, for the ability to spot a new compromise. Host-based agents or collectors are extremely helpful for investigating individual compromises, particularly with getting details around the infection and responding on the machine. Application-specific security metrics are likely the most important in a corporate incident as they are likely tied to your core business practices and could show the attacker moving on their goals or abusing your data, even if they exploited the principle of humanity and compromised legitimate users. For example, if your core product were a massive multiplayer online game, adding security logs and metrics to your game would likely uncover direct abuse faster than searching for an internal compromise. That said, this data is less useful in an attack and defense competition as the focus is typically on network penetration with less complex web applications in play. We will begin by looking at security log generation at these various sources, examining host-based, network-based, and application-specific telemetry, then we will cover some additional log aggregation, sorting, and search technologies. The logging journey does not stop there – after alerting on events we will look at post-processing and enrichment, including artifact extraction, storage, and analysis. The following is a slimmed-down list of some high-level projects you may want to consider when planning the toolsets for your defensive team. Within each of these areas, there are a number of techniques and tools that can be implemented. I will primarily focus on free and open-source solutions.
Signal collection
To start, let us look at host-based security event generation and collection. In this space, there are many traditional solutions, such as anti-virus providers like McAfee, Microsoft Defender, Symantec Endpoint Protection (SEP), Kaspersky, and ClamAV to name a few. While often thought of as depreciated, these agents can still produce particularly useful alerts on known malware and offensive techniques. Some platforms, such as SEP and Kaspersky, can also provide alerts on statistical anomalies, like when attackers use a crypter or packer to obfuscate their payloads.
While these solutions can be very useful in a corporate environment to deal with commodity threats, they will be less useful in attack and defense competitions where the offense may leverage custom malware. There are also endpoint detection and response (EDR) platforms, which are a more modern evolution of previous anti-virus scanning solutions. While EDR platforms incorporate many of the same factors as a traditional AV, one major differentiating factor is these tools let operators make arbitrary queries on their data. EDR agents also enable remediation and response actions to be taken on the impacted hosts remotely while the host is still online, which is known as a live response. These capabilities can be extremely effective when dealing with a live attacker, by leveraging the real-time ability to counter the attacker's plans on a specific host. Another core value of these tools is recording at a higher level of granularity all actions taken on the target. For example, out-of-the-box Windows and OS X may not record process creations, command-line parameters, modules loaded, and so on. EDR agents can be configured to record detailed process telemetry and to send this data to a central server, allowing for alerting and reconstruction of the incident. Reconstruction of the breach is key to ensuring we can prevent further occurrences of the threat. This is a key theme when performing incident response and is known as root cause analysis (RCA). As we will see in Chapter 8, Clearing the Field, if we attempt to remediate an intrusion without performing RCA, we risk only partially remediating the compromise, tipping our hand to the attacker and allowing them to change their tactics. With EDR agent data it is easy to investigate a single host, then search for the techniques or malware used there across the rest of the hosts or fleet. Using EDR agents also enables a way to interrogate all hosts with a security hypothesis to help determine if better alerting can be written, a process known as hunting. We will visit these hunting techniques much more in Chapter 7, The Research Advantage, where we look at discovering new alerts, forensic artifacts, and even log sources. EDR agents can also be used to collect rich behavioral data about processes, such as which files, network connections, and handles a process has open. Behavioral data can create some of the strongest alert types by ignoring fungible variables, such as process names, and focusing on metrics including how many files or network connections the program makes. Such behavioral technology could detect abstract techniques like port scanning or encrypting files for ransomware goals, regardless of the tool implementing the technique. Another popular technique for detecting compromise in corporate environments using EDR solutions is known as anomaly detection. This involves sorting all of the processes or executable telemetry in a given environment and going through the outliers. Often starting with the fewest occurrences of a given executable or process in an environment will uncover malicious anomalies. There are many popular commercial offerings in this space, such as Microsoft's Advanced Threat Protection, CrowdStrike, CarbonBlack, and Tanium, to name a few. One of the issues with commercial offerings is they are often configured to alert on as few false positives as possible.
This is important in a long-term deployment as we want to minimize analyst fatigue with unnecessary alerts. However, in a competition setting, where the time frame is shorter and we know we have an attacker in the environment, we will want to configure our host-based security collection to be as verbose as possible. By having sufficiently verbose endpoint collection we should be able to triage more esoteric hacker techniques or debug strange processes we encounter. I like using similar open-source EDR applications such as OSQuery[10] for extra enrichment and GRR Rapid Response[11] for additional investigative inquiries. Other very popular open-source EDR frameworks you could consider are Wazuh[12] or Velociraptor[13]. Both frameworks have a long tenure in the security space and have evolved over a number of years, making them robust and fully featured. Regardless of the solution you choose, host-based signal enhancement is great for digging into an incident on a specific host or searching the entire fleet for an indicator.
Network monitoring is an immensely powerful source of security data. By setting up strategically placed network taps you can see which devices and protocols regularly communicate over your network. As previously mentioned with host-based data, network telemetry can be used to spot anomalous or blatantly malicious traffic by sorting the protocols or destinations in your traffic. A good network monitoring program can be used to slowly harden the network posture by enabling the sysadmin to understand what normal traffic is, thus enabling them to reduce highly irregular traffic at the firewall. In a competition environment, this can be as simple as allowing the scored protocols through the firewall, reducing the immediate set of traffic your team needs to analyze. Following the principle of physical access further, by controlling inline network traffic using an IPS technology such as Suricata or an inline firewall, you can block all traffic from a compromised machine or isolate it to a containment VLAN. When you quarantine or isolate a host to prevent further lateral movement, you can have preconfigured firewall rules that still allow your team to triage the host. These network monitors can also be used for signal analysis, such that the defense can observe anomalous network transfers even if the offense attempts to tunnel those communications through another protocol or host. Throughout this book, we will be using a combination of Snort, Suricata, Wireshark, and Zeek to look at network traffic. Snort is nice for identifying known malicious patterns of network traffic; we will use Snort much like our traditional AV enhancements[14]. Suricata is similarly useful for helping us identify malicious behavior patterns in traffic[15]. Zeek is great for breaking down different protocols and providing detailed logs about the protocol flows[16]. These core monitoring applications will serve as permanent solutions we can deploy around the network, providing powerful capabilities if we can get the infrastructure in place. Network monitoring is also very good for identifying problems on the network, making it a strong debugging resource.
In a competition setting for example, if a service is showing as down on the scoreboard, the defensive team can use their network monitors to quickly understand if the issue is a routing problem on the network or an endpoint issue on the affected host. While endpoint detection can be like searching for a needle in a haystack, network monitoring is like watching traffic on a highway – even at high speeds, it is often much easier to observe malicious behavior and where it is originating from. While you will occasionally receive firewall and network monitoring appliances in a default network architecture or competition environment, you can almost always rearchitect the network or routing to put one in place. In line with the principle of physical access, if you own the network switches you can likely hook a device onto the SPAN or mirrored ports[17] to receive traffic on your monitoring interface. Furthermore, you could route traffic through a single host and turn it into a network monitor using a command-line tool like tcpdump[18]. This can quickly be done with the following one-liner, which will capture all traffic at a given interface, in our case eth0
:
$ sudo tcpdump -i eth0 -tttt -s 0 -w outfile.pcap
Granted, you will want to make sure the machine you collect traffic on has sufficient throughput and disk space to store the data. Collecting raw pcap
data can build up very quickly, meaning you will need proper storage or to keep an eye on any live collection. A great one-off tool you can use for on-the-fly network traffic analysis is Wireshark[19]. This tool is very popular because it comes with a GUI that will colorize protocols and allow operators to follow select TCP streams. Wireshark also includes a modular plugin framework, so if you encounter a new protocol you can reverse engineer it, then include the protocol dissector in Wireshark for it to decode[20]. While you can easily use these quick solutions, you will likely want to invest in infrastructure here to really harness these capabilities over the long term. That said, Wireshark even comes with a command-line alternative called tshark, which is a headless network collection and parsing tool. tshark can do a number of analysis tasks on raw pcaps, but it can also collect network events for you as well. You can even use tshark to perform modified collection and produce special logs like the following, which will give all source IPs, destination IPs, and destination ports regarding traffic to and from a machine[21]:
$ sudo tshark -i eth0 -nn -e ip.src -e ip.dst -e tcp.dstport -Tfields -E separator=, -Y ip > outfile.txt
Another important source of logs that may be available to you is application-specific security enhancements. Often security isn't thought of with the initial service and instead is added on as a product in-line, as in it is put in the middle of the network route to access the service.
This may look like a custom security appliance in your network, such as an email security gateway or a web application firewall in front of an important web app. These tools will also generate logs and alerts that are critical to your security program. For example, phishing is seen as a prominent vector into many organizations, so those organizations may use a product such as Proofpoint or Agari to screen incoming emails for security information and potentially alert on phishing emails. These tools could also provide application-specific response capabilities, for example with email, it could offer the ability for users to report emails or for network defenders to mass purge selected malicious emails. These security tools also cost a significant investment, both in terms of budget and expertise, so it's important to make them first-class citizens and give them the proper attention if your organization has decided to invest resources in them. Often, they are sold as a license or service subscription and come with vendor support, meaning you should prioritize their configuration and use the support resources if you've made the investment or been given the technology in a competition. A close relative to these security application logs are abuse metrics related to your business's core service. For example, if your organization runs a large custom web application that supports e-commerce or virtual hosting, you will want detailed metrics related to the use or abuse of this service. These could be metrics such as the number of transactions an account makes or top users of your API service. Just as we saw with other log sources, similar behavioral and anomaly detection methods apply. From a behavioral perspective, you could look at how quickly users navigate your pages to determine if automated abuse is occurring. From an anomaly perspective, you could sort data and view login attempts from similar IP addresses, to detect account takeover attempts of your user population. Another important log source to review is internal tooling and applications. Reviewing your own tool's logs for abuse or anomalous logins can help determine if someone in your group was compromised or if you have an insider threat. While auditing internal tool logs will likely not take as high a priority during an active network compromise, overlooking these logs would be considered a grave mistake in ensuring your operational security.
Finally, active defense infrastructure can help us coax the attacker into revealing themselves within the network. Active defense tools are solutions that seek to deceive attackers into thinking a piece of infrastructure is vulnerable, in an attempt to lure the attacker out[22]. Active defense infrastructure will be a major theme of this text, giving the defense an advantage through setting traps for the offense. We will see how showing the false will help us detect the attacker by deceiving them to think we are vulnerable when we are not. Practically, this means using tools like honeypots, honey tokens, and fake infrastructure to trick our opponent. While this can be thought of as extraneous infrastructure, it comes back to the principle of deception we covered in the last chapter.
By creating fake but believable targets that are easy for our target to hack, we can bait them into divulging themselves and giving the defense the upper hand. This investment is a bet on the effectiveness of the deception. I would consider this solution to be an additional tactic to the collection methods already described, but probably not a great standalone tactic. The real secret to creating effective honeypots is to make sure that there are readily available paths that lead to the honeypot so that if an attacker were to compromise a typical user of a machine, they would naturally discover and be drawn to the trap. There are tons of examples within the Awesome Honeypots GitHub repository (https://github.com/paralax/awesome-honeypots), but the important part is picking an applicable solution to your network. Honeypots or tokens have been made for all kinds of applications and their use in your network should be strategic; otherwise, they will sit undiscovered for years. That said, if you can make a juicy target that is easy to discover, you may find it to be an excellent indicator of an attacker on your network.
Data management
Log aggregation is one of the biggest time-saving tasks a defensive team can focus on. In my opinion, logging pipelines are one of the unsung heroes of modern defensive infrastructure. Simply, logging doesn't get the attention it deserves in most defensive publications. In many corporate IT deployments, logging is ubiquitous and transparent, already happening in the background of most production environments. If your organization can piggyback on an existing logging pipeline it may save your team a great deal of infrastructure management. In a competition environment, this infrastructure is far less likely, and if you do manage centralized logging it will probably be through chaining simple tools. Logging can be as simple as sending everything to a single host, or as complex as deploying a tiered security information and event management (SIEM) service. Often logging pipelines are incorporated into the SIEM application, but it doesn't always have to be the case and logging can benefit from decoupling. Services like Filebeat or Logstash may be used to supplement an all-in-one solution such as Splunk[23]. Splunk, a vendor solution, can also quickly provide log decoration and normalization benefits before the logs ever reach the SIEM. Regardless of whether you use the full SIEM or not, harnessing a logging pipeline means you can edit your logs and standardize them as you collect them. If you're not using a centralized logging solution such as a SIEM, you can still use a logging pipeline to enrich logs on a single host or send them all to a single location. Centralized logging can even be as simple as using default capabilities such as rsyslog, SMB, or even Windows event log[24]. The reason I say simple log aggregation is different from sending to a SIEM is that there is a lot of power in indexing, searching, alerting, and even creating rich displays of our data that a SIEM gives us.
From a consulting perspective, this could look like a logging pipeline to support scripts that rapidly collect and index forensic data to scope an incident. Regardless, being able to triage issues across your target environment from a single host is a huge time-saving feature.
A full SIEM is a powerful investment to help sort and search logs. Products like Splunk or Elasticsearch can provide rich capabilities in terms of searching for and combining multiple data sets. In a competition environment, this may be more of a dream unless the hosting organization provides one or allows you the infrastructure to host one. That said, this is a critical piece of technology in any real defensive posture. The ability to index multiple log sources, search them in concert, transform data sets on the fly, combine them with external data sets, and display rich tables or graphs of data is invaluable for this type of analysis. As we briefly touched on earlier, Splunk is a dominant service in this space because of its ability to index and transform data. Splunk has many advanced features such as User Behavior Analytics (UBA), which correlates logs to perform anomaly detection on various user activities and detect account compromise[25]. Splunk also offers an integration platform where users can write plugins to use data with custom services or provide unique displays in the UI. An open-source alternative to Splunk is HELK[26], which is a free option providing similar functionality for those on a budget. HELK is a combination of many open-source logging technologies such as ELK, Elasticsearch, Logstash, and Kibana, and shows how the principle of innovation can easily be applied to create security-specific solutions. Throughout our efforts in this book, we will primarily use Elasticsearch with the HELK stack because it is open-source and easily available[27]. If you are looking for a slimmer deployment, ELK also has built-in alerting functionality as standard. We can also look at using a special SIEM just for indexing and analyzing our network-based logs. A tool such as Vast can both ingest Zeek logs and raw pcap
to provide search capabilities on these data sets[28]. Logs will be the base element we will ingest and work with throughout the network. A SIEM can help normalize those logs by mapping them to common elements, so you can perform intelligent searches across all your data and not just individual log sets.
A nice to have would be a security orchestration, automation, and response (SOAR) application to help automate alert enrichment. In many large deployments, SOAR applications act as the connective tissue tying a myriad of other appliances into the SIEM. Such an application will reach out across your network and correlate the alerts with various information to get more context. This tool could enrich elements of the alert with more data, such as the user in an alert with all of their attributes from Active Directory. A good open-source example of a SOAR platform is Cortex[29].
These larger applications that tend to integrate lots of infrastructures are a big investment but the reward in enhancing the triage for a professional security operations center (SOC) is unparalleled. This application will act as a central hub that allows analysts to quickly interrogate and act on various pieces of infrastructure throughout the environment. Not only will analysts get more information with every alert, including rich context, but they will also be able to triage incidents quicker and with automated responses, saving time in operations. A single pane of glass for all decorated event triage context is critical during a high-stakes event. Switching between many tools, technologies, or UIs is time consuming and error prone. SOARs help the defense solve this problem in a swift and repeatable way.
A separate component of the SIEM or SOAR should be a set of events or alerts, along with plans to review and update these events regularly. Ideally, these should be decoupled from the SIEM or SOAR application so that the team can review and curate the alert set independently. You can use infrastructure to help you manage this; using a project such as Threat Alert Logic Repository (TALR)[30] can help manage alerts by organizing them according to features, tactics, or behavior. Using such a project could also help bootstrap your detection logic by giving you some good starting rules. OpenIOCs, or indicators of compromise, were a generic type of alert format invented by Mandiant in 2013 in an attempt to standardize the alerting format[31]. I bring it up because the OpenIOC format included what I consider an essential feature of alerts, which is combinatory logic. A major failing of traditional antivirus solutions is taking too simplistic of an approach in their detection logic; by not combining multiple sources of data or context they often fail to detect more advanced attacker techniques. OpenIOC logic aims to provide defenders with a rich set of logic to create alerts that can take multiple pieces of evidence into account. Regardless of the event syntax or format you use, it is important to both standardize your detection logic and create robust event logic. This will help with reviewing existing alerts and strategizing future detection initiatives. Playbooks are another set of solutions your group can catalog and review. Playbooks are a technology that can help enhance alerts, by automating the associated actions that should be taken if that alert triggers in your SOAR[32]. Your alert logic should be essential to your defensive organization, as this is what your operators are trained to look for as malicious activity. This should be written down and codified instead of kept as tribal knowledge, both to help disseminate the information among the team and regularly review its merit in terms of detection logic. By organizing your alert logic, you can begin to assess your gaps and where your team may be weak in terms of detection logic. If you have an offensive operations team, this would be a great place to have them help perform adversary emulation and brainstorm potential detection or alert logic. Reviewing popular techniques or covering gaps in your operating team's detection logic is a great way to prepare for both cyber competitions and real conflict.
With a real defensive operation, you would be remiss to not include an incident response case management system or alert management system. In a corporate deployment, this would be used between shifts and over a long time to track all ongoing cases and to make sure nothing is being dropped. In a competition environment, this can be as simple as a strike list of potentially compromised hosts or hosts you need to triage. Whatever your desired workflow is, rapidly triaging and resolving alerts, escalating alerts into larger incidents potentially for a different team, or having a system where you can track which cases (and steps in a given case) are being actively worked, is a vital system. This can be as simple as a spreadsheet to track infected hosts or remediation tasks. These spreadsheets can include tabs per host regarding who is triaging which pieces of evidence at any given time. Or this can be a standalone system with a rich application where users can upload and tag additional pieces of evidence to a case. ElastAlert comes built into HELK, which makes it an easy choice for deployment and testing [33]. We can also use ElastAlert in TheHive for our alert management system, as it comes built in and makes it easy to integrate with other deployed systems. ElastAlert can then send operators emails when they trigger on a known alert, and the alert triage flow can be handled in TheHive[34]. By using TheHive we can integrate our alerts into other standalone services we may have, including integration to Cortex, allowing us to take actions directly from alerts. Using TheHive, with Cortex enrichment from the rest of our infrastructure, will be a powerful single interface that operators can use for alert investigation and resolution; otherwise, they may have to bounce between many systems in triaging an alert or incident.
A further set of nice to haves would be any form of intelligence aggregation application. Applications such as MISP can take multiple intelligence feeds and integrate them into a single location where your team can curate and track intel indicators[35]. Collaborative Research Into Threats (CRITS) is another such application that can aggregate multiple intelligence feeds and map connections of artifacts with its internal graphing database[36]. Professional intelligence services can also be purchased, which manage the intelligence feed curation on your behalf; however, these often cost a significant annual price. Hosted intelligence platforms can then be directly integrated into the SIEM or SOAR application to provide threat enrichment if there is ever an intel indicator match. Such an application could also run artifacts through your malware triage platforms, copy artifacts to your forensic evidence store, and even start an incident response case in your case management system if properly integrated. While aggregating external threat intel is extremely powerful, another useful feature of these applications is that they document your detailed notes and comments about threat data. The knowledge that another team member previously investigated on a specific threat, or saw similar indicators in a different alert, is powerful information to share within a team.
A private forensic evidence management system is another consideration for any defensive team. A natural follow-on to an incident response system is a system to store and catalog forensic artifacts that are discovered. This can help dramatically in post-analysis, attribution, or gaining an advantage over the opponent. This will likely be seen as an extraneous consideration until other systems are in place, but even a simple solution here can pay dividends in years to come with evidence management and malware analysis. Ideally, this should be integrated into the case management system, but it can be as simple as a network share or SFTP server where artifacts are dumped for backup purposes. You could also edit the permissions such that users couldn't update or delete other's evidence, perhaps by making files immutable after they are written. Such a write-once system would make sure artifacts or evidence is not accidentally overwritten or tampered with. These simple innovations could assure the integrity of artifacts and harden the authorization of the application. On Linux this can be done by setting the sticky bit, so only the file's owner or root can edit or delete the file. You can set the sticky bit on a directory or share with: chmod +t dir
. You can take this further by making files immutable so that even the owner can't edit or delete the file with chattr +i file.txt
. Ideally, you will also want something to hash files when they are uploaded to track and verify their integrity. Some of the most important attributes to store are the data itself, a hash of the data, the date it was written, and potentially the user that wrote it. The following is a quick script to show the reader how easy it is to innovate on these concepts with just a little scripting. In this case, we use Python 3.6 to watch a directory and make any new file added to the directory immutable, as well as adding a timestamp, file path, and hash of the file to our log. This script only runs on Linux because we make use of the native chattr
binary. Be careful not to run the script in the directory it's monitoring or else it will enter an infinite loop as it observes itself updating the log file:
import sys
import time
import logging
import hashlib
import subprocess
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
logging.basicConfig(filename="file_integrity.txt",
filemode='a',
level=logging.INFO,
format='%(asctime)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
hasher = hashlib.sha1()
def main():
path = input("What is the path of the directory you wish to monitor: ")
event_handler = LoggingEventHandler()
event_handler.on_created = on_created
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
def on_created(event):
subprocess.Popen(['chattr', '+i', event.src_path], bufsize=1)
with open(event.src_path, 'rb') as afile:
buf = afile.read()
hasher.update(buf)
logging.info(f"Artifact: %s \nFile SHA1: %s\n", event.src_path, hasher.hexdigest())
print("New file added: {}\n File SHA1: {}\n".format(event.src_path, hasher.hexdigest()))
if __name__ == "__main__":
main()
The preceding script is fairly simple but vastly powerful and applicable. You can use the script for almost any file reaction, and it can be chained together to create pipelines of analysis and processing for almost any task. Let us take a deeper look at the code. Below Comment #1
, we can see the watchdog
imports. watchdog
is a critical library that will give us the ability to monitor for and react to events. Operators may need to download the watchdog
library with the Python-Pip package manager. Next, below Comment #2
, we can see how watchdog
is configured to log its results to a text file. In this configuration, we can see the name of the log file and that the log file is in append mode, along with the format of the log messages. Below Comment #3
, we can see the event handler is being created. We can also see the default event_handler.on_created
event being set to our function on_created
.
Next, we see the observer being instantiated, followed by the observer being correlated to our event handler and the target file path, and then starting the observer. Jumping down to below Comment #4
, we can see the arbitrary actions that we invoke when the observer sees a new file write. In our case, we are spawning a new process to run chattr +i
on the newly written binary, as discussed previously. We also use this method below Comment #4
to open the newly created file, get the file's SHA1
hash, and write this hash to our log file. In the next section, we explore more analysis options we can perform on files we collect.
Analysis tooling
Another set of tools I find are absolutely critical are local analysis and triage tools. These could be tools that help you get more local telemetry, potentially investigating some suspicious processes, or even analyze an artifact you found on the target system. Analysis tools are critical to giving your operators more insight into common operating systems, forensic artifacts, and even unknown data they may discover. Some good examples of Windows local analysis tools are things from SysInternals Suite, such as Autoruns, Process Monitor, and Process Explorer[37]. Such tools will allow analysts to look at programs that have been locally persisted, various programs and threads that are running, and specific system calls those programs are making. These could also be tools with file, log, or artifact collection and/or parsing capabilities; tools that allow you to investigate different pieces of evidence. For example, tools such as Yara could allow you to quickly search a disk or directory for interesting artifacts in files[38]. Another set of tools including Binwalk[39] or Scalpel[40] could then let you extract embedded files or artifacts that were discovered in Yara scans. By chaining local analysis tools like this, a team could quickly develop hunting routines to find trojaned files or embedded artifacts[41]. Traditional forensic tools also work wonders here, tools such as TheSleuthKit or RedLine, depending on the systems[42]. TheSleuthKit is still amazing for analyzing disk images and artifacts within images[43]. Likewise, tools such as RedLine or Volatility can be useful for doing on-the-fly memory analysis[44]. This allows for both rapid live response triage of a host, as well as pulling artifacts back for local analysis. On my defensive teams, I like to collect and prepare a standard set of tools team members can use for common analysis tasks, along with runbooks to help analysts use those tools. This practice of tool preparation helps standardize our analysis tools and create experts on the team.
An incredible example of the principle of innovation is the CCDC team representing the University of Virginia's (UVA) development of a tool called BLUESPAWN[45]. This tool is a Swiss Army knife of existing tools and capabilities that students at UVA previously automated to meet their needs. BLUESPAWN is written in C++ and only targets the Windows operating system but is a powerhouse in terms of functionality.
The UVA team claims BLUESPAWN is a force multiplier, allowing team members with a Linux focus to easily triage Windows systems by using the tool. BLUESPAWN includes several high-level run modes, such as monitor, hunt, scan, mitigate, and react, for a variety of functionalities in one tool. BLUESPAWN is designed to unleash a verbose firehose of information back at operators, with which the defense likely trains on various runbooks to help debug, interpret, and respond to the tool's output. BLUESPAWN can also automate much of the patching and hardening of a system using this tool's mitigation features. BLUESPAWN also allows the defense to monitor and hunt in real time for specific techniques, giving them repeatable actions that they can use for triage. This tool will greatly enhance the capabilities of the group and would work excellently with a little training and some common runbooks[46]. In the next chapter, you will see how they use this tool to automate tools like PE-Sieve and hunt for process-injected beacons of Cobalt Strike[47]. In Chapter 3, Invisible is Best (Operating in Memory), we will take an in-depth look at this detection logic, walking through this reaction correspondence at play. Seeing this type of innovation puts the offensive teams on the back foot and gives the defensive teams a powerful advantage in their live response and triage capabilities.
Malware triage platforms, both static and dynamic, can be a powerful asset to any analysis team. These systems can be a cheap substitute for an actual reverse engineer, or a time saver for both reverse engineers and analysts. An open-source and extensible static analysis platform is Viper, where people can write extensions in Python to perform actions on individual forensic artifacts. Such a platform could act as the forensic storage and analysis capabilities all in one[48]. From here you could have various workers determine if files are executable files, extract data from them such as URLs and IP addresses, and integrate this platform back into your threat intel application for enrichment. This framework can easily be integrated into a dynamic analysis platform such as Cuckoo Sandbox, where analysts can see detailed run information from the binary[49]. Dynamic analysis can be extremely effective for getting more information via running malware in highly instrumented sandboxes, often revealing details that are obscured from basic static triage. At times, setting up dynamic sandboxing, especially Cuckoo Sandbox, can be exceedingly difficult due to various compatibility issues with supported hypervisors, agents, and virtual machines. If you're looking at Cuckoo, you may consider the GitHub project BoomBox, which will spin up a full Cuckoo deployment in a few simple commands[50]. BoomBox also deploys a feature in the sandbox infrastructure known as INetSim, which will fake network communications to tease more functionality out of the running malware[51]. These private infrastructure platforms will not likely be available during a competition environment, but perhaps similar cloud services will be in scope. Services such as VirusTotal[52], Joe Sandbox[53], Anyrun[54], and HybridAnalysis[55] can give a massive boost in analysis capabilities against a particular piece of malware, but also come with the drawback of using a public service.
With some public services, such as VirusTotal, offensive actors can write their own Yara rules to see when their malware gets uploaded to the platform. If this were the case, then uploading the sample would tip the defenders' hand, letting the offense know that they have acquired a particular sample.
Data transformation utilities such as CyberChef can also be immensely helpful[56]. These should be considered auxiliary applications as they will not necessarily help in your core goals of detection. That said, hosted utilities can buy your team additional time and operational security in a crunch by giving them a centralized and secure service to perform common data transformations. This is also a great location to practice the principle of innovation. We can easily take local analysis tools such as those we've looked at earlier and create web services or other utilities that wrap those services. A great example of this principle is another homemade web application multitool, Pure Funky Magic (PFM)[57]. PFM contains many common utilities that analysts would use but via a central location to access and share transformations. Similarly, Maltego or other mind-mapping services can be excellent for sharing intelligence or data about threats or targets among team members[58]. These tools can be a force multiplier for sharing threat intelligence data and operational capabilities if you have that expertise on your team.
You should also consider offensive components on your blue team. This is essentially vulnerability management and penetration testing expertise, using the skills required to scan your infrastructure for vulnerabilities. You can pull a lot of this infrastructure from the next section on offensive perspectives, though I don't think the persistence or deception tactics apply if your team is just self-assessing for vulnerabilities. On Pros V Joes, an attack and defense competition with up to 10 team members, I have one or two team members focused on offensive operations. Because all of the network designs are the same in that competition, they begin by looking at the team's own infrastructure for vulnerabilities. This has many benefits: the closer infrastructure allows for quicker and more accurate scanning results, it allows us to locally develop and test exploits while protecting operational security, and it allows us to take points away from our opponents. After we've determined that our systems are reasonably hardened, we can automate some regular scanning intervals, and turn our tools on our opponent's infrastructure for exploitation.
As you can see, there is a lot of infrastructure that needs to be set up and in place ideally before a cyber incident, or at least ready to rapidly deploy in the case of an incident. It requires great skill and planning in choosing what technologies to implement first and on what timetable, while also keeping resources available to do basic operations. If you want to play around with some of the technologies I've mentioned, I highly recommend checking out Security Onion 2[59]. This is an evolution of the very popular Security Onion, refactored with many of the tools we've already mentioned in this chapter.
While Security Onion 2 is designed to be deployed to production, you may also want to deploy dedicated hardware and software as a permanent solution. Many of the pieces of infrastructure I've mentioned will need their own dedicated deployments, potentially even with clustered hosting. This means you should use Security Onion 2 to explore potential solutions, see how they integrate with other services, use it for local triage, develop with it, and even deploy to production in smaller environments, but you should also consider deploying dedicated solutions. Obviously, there are some critical first steps, such as understanding the environment, building out the required talent, and flushing out a development plan, but after that, each component of infrastructure will be a major investment in its own right. It's important to not take on more projects than you are adequately resourced to manage, so choosing your early infrastructure investments wisely is a key decision. Depending on the staffing, I think security telemetry, log aggregation, artifact analysis, and live response capabilities would be some of the most important to prioritize.
Defensive KPIs
It helps to have metrics to measure the operational efficiency of a team[60]. For that, we can use KPIs. KPIs are small measurable indicators we can use to benchmark how our team performs and gauge differences in their performance over time. For the defensive team, we may want to measure things like 1/10/60 time, or the mean time taken to detect an attack, the mean time taken to respond to an incident, and the mean time taken to resolution per incident. Other metrics may include the number of incidents triaged, the mean time taken to triage an incident, outliers on incident triage, or the number of rules that have been reviewed, to suggest a few. Such metrics will help your team identify gaps or weak points in your process that may be failing silently or need more resource investment. Often security is discussed in white-or-black terms of success or failure, but there is actually a myriad of different outcomes and lots of progress to be made in preparing for a conflict[61]. Remember, the benefit of long-term planning is improving over time and metrics are your tool to make sure your team is heading in the right direction.