Essential considerations
Let's look at some potential roadmaps or solutions to include in your roadmap, for either cyber competitions or larger operations. We will start with essential properties on either side of our asymmetric conflict. Whether part of the offense or the defense, both sides rely on fluid communications and information sharing to carry out their operations. Both sides will be required to build and maintain a team for these operations. Furthermore, both sides will partake in strategic and operational planning. In this section, we are focusing on what the offensive and defensive teams have in common, and in later sections, we will focus on the differences between these unique teams.
Communications
As you begin to form your cyber operations team, plans should be documented to ensure the team has a set of broad goals and at a minimum, a shared direction or North Star. These plans should be written down for long-term reference, team collaboration, and development. While planning may seem like a task for managers, even individual contributors can develop their skills or tools by partaking in the shared collaboration and team direction. Planning is a team effort that unites the team in a shared vision. Both offensive and defensive teams benefit from having a wiki to store and share team knowledge, which may have been acquired on behalf of individual team members over a long period of time.
A knowledge base may also be a code repository such as GitLab, or a simple document repository such as an SMB share with documents. It should enable sharing within the team and could be publicly hosted, on a private network, or even ephemerally shifting as a Tor onion service. Ultimately, the intent is that we maintain a common medium where team members can share plans, tools, and information regarding tools, techniques, and policy. This location should be accessible and the solution should be semi-permanent with an emphasis on long-term team support. Choosing a good wiki or note repository is critical. You may want a publicly hosted product with an API to enable automated integrations; you may want a privately hosted service or even something with open-source code that you can review. This decision depends on your risk tolerance and any requirements for confidentiality. You may want a strong authorization feature set, such that you can restrict pages and workspaces from users or groups. Compartmentalizing different development and operational details will help mitigate exploitation or compromise of one of the operators. One feature that I've always appreciated is real-time, cooperative document editing, such as with Google Docs or Etherpad[1]. Collaborative document editing can be very effective for the real-time editing and review of policy across distributed teams. Another set of compelling features could be integrated alerting and email updates. A good example of a self-hosted, open-source wiki application is DokuWiki, which is a simple and open-source wiki I've used on various engagements[2]. While I've presented readers with many features and options, wiki solutions should be an easy choice for competition scenarios. In competition environments, focus on a simple, easily accessible solution that includes authentication and confidentiality controls, and promotes team collaboration.
A close second to knowledge-sharing technologies are real-time communication and chat technologies. Communication is the lifeblood of any team. The quicker real-time communications become, the closer they get to chat and the quicker team members can iterate, develop, and collaborate on ideas together. Chat capabilities are critical for your team, so it's important to choose the right infrastructure, or at least leverage what you have. Even if your team has the luxury of all being in person, they will still need to send each other digital information, logs, and files. Generally speaking, chat or communications should be considered as whatever your primary method for digital interaction with your team is, for example, email, IRC, XMPP, Slack, Mattermost, Zoom, or even more ephemeral communications such as Etherpad. One major consideration you will want is the ability to copy/paste directly into operations, so using something like traditional SMS may not work well for primary communications. You can take this a step further and supercharge your team's chat with chat-ops. Having the ability to issue group tasks directly from chat can give your team powerful automation abilities, such as the ability to publicly triage hosts or receive scan data from the networks, and share it in a chat room with the whole group.
I've used chat-ops on an incident response team in the past to quickly interrogate our entire fleet of machines for specific indicators of compromise, with the whole team present. We could also pull artifacts from hosts and quarantine machines directly from chat, making for very fast triage and response times while scoping an incident. It is advised that if you go heavily into chat-ops, you have dedicated rooms for this as the bot traffic can overwhelm human conversation at times. Another feature you may want to consider in your chat application is the ability to encrypt chat logs at rest, something that provides additional confidentiality and integrity to the communication. This is supported in the Slack chat application as a paid feature, known as EKM, or Enterprise Key Management. EKM allows you to encrypt messages and logs with your own cryptographic keys stored in AWS KMS, or Amazon's Key Management Service[3]. Such features can be a lifesaver if part of your organization or infrastructure is compromised by allowing you to compartmentalize different chat rooms and logs. It can also pay to have a contingency chat solution in place, so that team members have a fallback if their chat is compromised, or they lose availability, for whatever reason. A contingency chat solution would preferably have a strong cryptographic method for proving authentication, such as GPG keys or using a solution such as Signal[4]. Furthermore, having these pieces of infrastructure in place, including a knowledge base and an effective communication system, will greatly enable the team to develop their plans and further infrastructure cooperatively. These two components will be critical to both offensive and defensive teams alike.
Long-term planning
Long-term planning is some of the most important planning your group can do. It will allow you to set a theme for your group and give the team an overarching direction and avenue to express their innovative ideas. The length of your long-term planning cycle depends on the scope of your operations. For competitions, this could be an annual cycle, or you could start planning with only weeks leading up to the competition. Generally speaking, a long-term plan can be anything that helps you prepare for an operational engagement during your downtime. You can also iterate on these plans over time, such as adding or removing milestones as an operation develops and new needs arise. Some examples of long-term plans are three-year to five-year plans, annual plans, quarterly plans, monthly plans, and can sometimes even be preparations for a single event. As an example, from a competition perspective, this could mean using the months prior to develop a training and hunting plan. Higher-level planning may seem frivolous, but in general, the team should have an idea of its general direction, and it is best to write this down to ensure all are in agreement.
Over time, these larger plans may be broken down into milestone objectives to help team members digest the individual projects involved and to time box the different tasks involved. These milestone objectives will help determine whether progress is being made according to plan and on schedule. Time is one of your most precious resources in terms of economy and planning, which is why starting the planning sooner can help you tackle large tasks and potential time sinks. You will want to use your downtime to develop tools and automations to make your operational practices faster. For example, if your team is spending a lot of time auditing user access and rotating credentials, you could plan to develop a tool to help audit the users of a local machine and domain. Long-term planning should involve the creation of projects, which then encompass the development of infrastructure, tools, or skill improvements you want to make available to the group. Make sure you over budget for time on projects and milestones to allow for pivoting or error along the way. This also means making sure you don't overtask individuals or take on more projects than you have resources for. The benefit of long-term planning is in building up your capabilities over time, so do not rush your project development and burn your team out early. Similarly, if you fail completely at long-term planning, you may find yourself in a cyber conflict technically unprepared, scrambling to get tooling in place, or simply blind to your opponent's actions.
No plans are perfect. You need to be able to measure how close you are getting to your objective, and make course corrections if something is not going according to plan. Contingency plans should be available if goals, objective milestones, or metrics aren't being met. This will be a major theme of this chapter, as we touched on in the principle of planning. Throughout this book, we will be looking for ways to measure and test our techniques and make sure our plans are according to schedule. As we saw with the principle of time, the timing of our plans is absolutely critical when playing against an adversary, so we need to know when to pivot to maintain the advantage. If we start to get data contrary to our plan, such that some techniques may be detected, we need to modify our plans and potentially our tooling to support our new strategies. This is rooted in our principle of innovation: if our strategy is discovered, we will lose our advantage so we should be prepared to pivot our operations in that situation. Former UFC champion George St-Pierre said, "Innovation is very important to me, especially professionally. The alternative, standing pat, leads to complacency, rigidity and eventually failure. Innovation, to me, means progression, the introduction of new elements that are functional and adaptable to what I do"[5]. As you go through your long-term planning, consider blocking off time for ad-hoc or unspecified research, tool development, or even process refinement. These stopgaps in long-term plans allow for pivots to be incorporated more easily. If a plan goes awry, these flexible gaps can be easily sacrificed for course correction. Otherwise, if the plan succeeds, these flexible gaps can be capitalized on for process improvement.
Expertise
One of the most important things you can prepare is knowledge. Hire for experience and talent, but also passion and team fit. It is important to build a quality team, in terms of expertise, experience, and capabilities, instead of a large quantity of bodies to throw at a problem. One of the unique aspects of computer science is the ability to both automate and scale solutions. This means an innovative engineer could automate a solution or part of a solution to a task that several people would otherwise perform manually. That said, you will absolutely need a team. There are simply too many areas of complex infrastructure and knowledge to manage with only a few people. Long-term plans should include owners in areas of subject-matter expertise. While you should generally be prepared for a wide set of digital environments, especially in regard to competition environments, it helps to know about your target environment and the types of systems that you will encounter there. In this book, we will primarily be focusing on Windows and Linux-based operating systems. Basic examples of expertise you could have on a CCDC team, either on offense or defense, include Windows strengths, Unix capabilities, web application experience, incident response prowess, red team abilities, and even reverse-engineering competencies. Lots of other skills also apply, such as vulnerability scanning, network monitoring, domain hardening, and infrastructure engineering abilities, to name a few. Areas you decide to invest in, in terms of expertise, should mirror your overall strategy and should be stacked toward your desired strengths. This means you should also invest in infrastructure and tooling that supports these areas of expertise and have members of your team cross-trained around your chosen expertise.
Contingency plans, in terms of the team's expertise, mean having the backup team trained in those areas and developing a training plan for cross-training resources. Cross-training can be in the form of weekly educational meetings, brown bags, or even quarterly formal training programs. Your group should be meeting regularly, which is a good time to exchange recent lessons learned. You can follow this up with individual training programs around skills team members are looking to improve. Formal training courses can be some of the best ways to upskill people quickly in areas you are interested in. SANS, for instance, is an incredible resource for cyber education, but the price is significant if you're on a tight budget[6]. Many free resources also exist in terms of cyber training, but the most important thing is to give employees dedicated time for training. One of my favorite free resources for low-level technical skills is https://opensecuritytraining.info/, which includes over 23 high-quality courses, many with videos[7]. Another interesting site for free education courses is Cybrary, while these courses aren't as in-depth as OpenSecurityTraining, their Career Paths include many relevant skills and their courses have a high production finish[8].
You can even turn this into value for the whole team by having them present on the topics to the group after they learn a new technique or skill. Even experienced practitioners will need time to practice new skills and continue their education. While training is great, nothing is a substitute for real experience. Newly trained team members will bring a lot to the table, but you need to make sure they put those skills into practice as soon and for as long as possible. You should have junior team members shadow experienced team members on their operations or in practice, if time allows. I also like to use new members to make sure documents are up to date and have them take additional notes for the wiki during these shadow sessions.
Operational planning
Operational planning is anything that helps operators prepare and navigate through an upcoming engagement. Operational planning can take the form of runbooks to help operators with basic information, workflows, or technical tasks. Operational planning can also be high-level goals and tenants of a mission, such as a rule that operators should abide by. This planning allows for smooth processes and for operators to help themselves when they get stuck. Operational planning can be both generic to all operations or specific to a target engagement. Tailored plans should be crafted per engagement, which includes overall goals and special considerations for that operation. In real operations, this would typically involve a lot of reconnaissance, making sure the target technologies or threat actors are appropriately scoped. In a competition setting, this can look like making a spreadsheet with every host in the environment and highlighting the ones running critical services. You can then assign team members tasks on servers and move through them systematically, either triaging or exploiting them. Operational planning can also be thought of as policy or procedures for the team. This level of planning, creating a policy with supporting runbooks, can also help make sure processes are operationally secure. Automating these individual operations within the plan will be a great innovation in any team. For example, one operational runbook may instruct operators to use a VM for operations, to help reduce endpoint compromise, malware spread, and operator identification. A team member could innovate on this policy by creating that golden VM image for the team, and potentially automating the deployment of these VMs for other team members. Furthermore, these VMs could also come with all of the appropriate tooling and network configurations the operators need. Any of this automation should also be documented, and the original runbook should be updated with the new details of the automation. If the project grows enough it should be considered for turning into a supported long-term project, with a proper development life cycle.
Ultimately though, runbooks should provide guidance on a technique or process to a team member looking for clarification on an operation. The runbook should link to external information that enriches the subject and provides context as to why a tool or process may determine something. Some of the most useful runbooks also provide anecdotal experiences, links to corner cases, or links to references where team members borrowed a previous implementation. Runbooks could also include common flags to look out for if a process is going wrong or if deceptive tactics are at play. Plans should then include contingencies, such as creating an incident if you think there are deceptive practices at work or pivoting to a live response if you think tools aren't reporting properly. Keeping runbooks focused and atomic in terms of scope will help make them flexible and chainable into different operational plans. Maintaining operational goals and runbooks is one way to prepare your team for the high-pressure and fast-paced action of cyber conflict, especially in a competition setting.
Another operational planning consideration is finding a way to measure your team's operational progress. KPIs, or key performance indicators, on the group can help understand how they are working overtime. It is often best that KPIs or metrics are recorded and collected automatically; automation will save the painstaking review process of gathering metrics for management. Because the game of computer security is asymmetric, we will look at individual metrics either offense or defense can use to measure their operations. Even within offense and defense, KPIs can often be very role-specific since you are evaluating role performance and efficiency. That said, later sections in this chapter will have some example KPIs for different roles. It is also worth mentioning again that computer science is extremely complex, so sometimes the KPIs may be capturing other factors and not truly measuring the targeted element due to complexity at play. A good example of this may be a defensive team trying to achieve the fabled 1/10/60 time of detection, investigation, and response speeds[9]. If they are using a cloud-based EDR service, there may be a delay in the ingestion and processing of the logs with that service, such as three to five minutes to receive and process an alert in their cloud. That means that no matter how finely tuned the defensive team's infrastructure, they will never be able to detect an incident within a minute of it happening while using such a cloud service. It is important to understand what's feasible in your environment when setting metrics and that may even require several rounds of measuring before you determine a baseline.
How your group plans to end a specific engagement is a forethought that should be done during the engagement planning. While we will have a specific chapter on this at the end of the book (Chapter 8, Clearing the Field), we will want to plan for how operations successfully end.
From the defensive perspective, this means planning and implementing capabilities that will allow you to evict an attacker from your environment. The defense needs the ability to do root cause analysis (RCA) and understand how the attacker got in and patch that vulnerability before they can exploit it again. From the offensive perspective, this could help determine when we will exit the target environment. You will also want to plan for the event that the operation takes an unexpected turn or goes in the opponent's favor. For the offense, this means planning how we will respond if the campaign is uncovered, our tools are exposed publicly, or even our operators are identified. This is often thought of as program security (Network Attacks and Exploitation: A Framework, Matthew Monte, page 110). As Monte describes it, "Program security is the principle of containing damage caused during the compromise of an operation. And no matter how good the Attacker is, some operations will get compromised. […] You do not want the failure of one operation impacting another." It is vitally important to consider how the offense will exfiltrate and exit their target environment after they have reached their goal. Similarly, the offense should consider what a successful response by the defense looks like and when they will choose to exit the environment or spend more resources to reengage. This is equally important to consider from a defensive perspective unless you want to relive a past compromise event.
This chapter moves from planning to setting up the infrastructure and tooling that each side should have in place to support their operations. Both sides will have a great deal of standing infrastructure. Tooling is critical to the team's operations, but due to the asymmetric nature of the game, I will cover each in their own section, one for each side. Even if you are not on the other team in your average role, I urge you to understand their tooling and infrastructure. As Sun Tzu says, "If you know the enemy and know yourself, you need not fear the result of a hundred battles." I cannot overstate the importance of understanding your opponent's tools and capabilities, as this outlines the options your opponent has available. I think Dave Cowen, the leader of the National CCDC red team, is a great example of this. For his day job, Dave is an incident response director, aiding defensive operations against real attackers. In his free time, Dave leads the volunteer red team, letting him think like an attacker and explore offensive techniques hands-on. Furthermore, if you can exploit your opponent's security infrastructure, you will gain a massive advantage in the conflict. In the following sections, we will see how much of the technology on both sides involves a great deal of standing infrastructure that in turn becomes a potential target itself.