The traffic lights on your way to work if you go by car; the collision avoidance system if you take the train or metro; the delivery of electricity that powers the light you use to read this book; the processing and packaging that went into creating the jug of milk in your fridge or the coffee grind for that cup of Joe that fuels your day... What all these things have in common is the ICS driving the measurements, decisions, corrections, and other miscellaneous actions that result in the end products and services we take for granted each day.
Strictly speaking, an ICS is a collection of equipment, devices, and communication methods that, when combined for the foundational system, perform a specific task, deliver a service, or create a particular product. Figure 1.1 shows an ICS architecture, spanning the various layers of functionality as described in the Purdue model (explained in a later section).
ICS functions
The following screenshot shows a typical ICS architecture, following the Purdue model and stretched out across the industrial and enterprise networks of an organization. It will be used as an illustration for the following sections:
Figure 1.1 – Typical ICS architecture
Within the ICS architecture shown in the preceding screenshot, the following main types of devices within the three main sections of the architecture can typically be distinguished:
- The Enterprise Zone is predominantly IT space. Devices, systems, and equipment typically found here are computer-related, such as servers, workstations, and laptops, as well as mobile devices such as phones, tablets, handhelds, and others. These devices are connected together with various Ethernet equipment and media, including switches, wireless access points, routers, firewalls, and the cables that connect all of these devices (Category 6 (Cat6)/Cat6e media).
- The Industrial Demilitarized Zone (IDMZ) functions as a barrier between the Enterprise Zone and the Industrial Zone and is typically implemented as a collection of virtualization hardware, firewalls, and switches.
- In the Industrial Zone, we can find a variety of regular off-the-shelf IT equipment, along with proprietary and specialized hardware that is used to run the production process. In an upcoming section, ICS architecture, we will discuss some of the more common systems that can be found in the Industrial Zone.
The ultimate goal of an ICS is to create a product or run a process. This goal is achieved by implementing distinct functions within the ICS that, when combined, allow for control, visibility, and management of the production or process control. We will now look at typical functions found within an ICS.
The view function
The view function encompasses the ability to watch the current state of the automation system in real time. This data can be used by operators, supervisors, maintenance engineers, or other personnel to make business decisions or perform corrective actions. For example, when an operator sees that the temperature of boiler 1 is getting low, they might decide to increase the steam supply of the boiler to compensate. The view process is passive in nature, merely providing the information or "view" for a human to react to.
The view function is presented in the following diagram:
Figure 1.2 – The view function
From a security perspective, if an attacker can manipulate the operator's view of the status of the control system—or, in other words, can change the values the operator makes decisions on—the attacker effectively controls the reaction and, therefore, the complete process. For example, by manipulating the displayed value for the temperature of boiler 1, an attacker can make the operator think the temperature is too low or too high and have them act upon manipulated data.
The monitor function
The monitor function is often part of a control loop, such as the automation behind keeping a steady level in a tank. The monitor function will keep an eye on a critical value such as pressure, temperature, and level, comparing the current value against predefined threshold values, and will alarm or interact depending on the setup of the monitoring function. A key difference between the view function and the monitor function is in the determination of deviation. With monitoring functions, this determination is an automated process, whereas with a view function, that determination is made by a human looking at the values. The reaction of the monitor function can range from a pop-up alarm screen to a fully automated system shutdown procedure.
From a security perspective, if an attacker can control the value that the monitor function is looking at, the reaction of the function can be triggered or prevented—for example, in the case where a monitoring system is looking at the temperature of boiler 1, preventing the temperature exceeding 300 °F. If an attacker feeds a value of less than 300 °F into the system, that system will be tricked into believing all is well while, in the meantime, the system can be in meltdown.
The control function
The control function is where things are manipulated, moved, activated, and initiated. The control system is what makes actuators engage, valves open, motors run... The control actions can be initiated by an operator either pushing a button or changing a setpoint on a Human-Machine Interface (HMI) screen, or it can be an automated response as part of the process control.
The control function is presented in the following diagram:
Figure 1.3 – The control function
From a security perspective, if an attacker can manipulate the values (the input) the control system reacts on, or if they can change or manipulate the control function itself (the control program), the system can be tricked into doing things it wasn't designed to do or intended for.
Now, I can hear you all say, that is all fine and dandy manipulating values, but surely that cannot be done with modern switched networks and encrypted network protocols. That would be true if those technologies were implemented and used. But the fact is that on most, if not all, ICS networks, confidentiality and integrity of industrial network traffic is of less importance than availability of the ICS. Even worse, for most ICSs, availability ends up being the only design consideration when architecting the system. Combine that with the fact that the ICS communication protocols running on these networks were never designed with security in mind, and you can start to see the feasibility of the scenarios mentioned. Most automation protocols were introduced when computer networks were not yet touching automation devices, for media that was never meant to share data across more than a point-to-point link, so security around authentication, confidentiality of data, or integrity of send
commands was never implemented. Later, those point-to-point protocols were adapted to work on communication equipment such as Ethernet, which exposed the insecure protocols to the entire production floor, the plant, or even out to the internet.
ICS architecture
ICS is an all-encompassing term used for various automation systems and their devices, such as Programmable Logic Controllers (PLCs), HMIs, Supervisory Control And Data Acquisition (SCADA) systems, Distributed Control Systems (DCSes), Safety Instrumented Systems (SIS), and many others.
The ICS architecture is presented in the following diagram:
Figure 1.4 – Large-scale ICS architecture
PLCs
PLCs are at the heart of just about every ICS. They are the devices that take data from sensors via input channels and control actuators via output channels. A typical PLC consists of a microcontroller (the brains) and an array of input and output (I/O) channels. I/O channels can be analog, digital, or network-exposed values. These I/O channels often come as add-on cards that attach to the backplane of a PLC. This way, a PLC can be customized to fit many different functions and implementations. Programming of a PLC can be done via a dedicated Universal Serial Bus (USB) or serial interface on the device or via the network communications bus that is built into the device, or comes as an add-on card. Common networking types in use are Modbus, Ethernet, ControlNet, and PROFINET.
An example of a mounted PLC is provided in the following screenshot:
Figure 1.5 – An Allen-Bradley rack-mounted PLC
PLCs can be deployed as standalone devices, controlling a certain part of the manufacturing process such as a single machine, or they can be deployed as distributed systems, spanning multiple plants in dispersed locations with thousands of I/O points and numerous interconnecting parts.
HMI
An HMI is the window into the control system. It visualizes the running process, allowing inspection and manipulation of process values, showing of alarms, and trending of control values. In its simplest form, an HMI is a touch-enabled standalone device that is communicated via a serial or Ethernet-encapsulated protocol.
Some examples of HMIs are presented in the following screenshot:
Figure 1.6 – HMIs
More advanced HMI systems can use distributed servers to offer a redundant supply of HMI screens and data. An example of one such system is presented in the following screenshot:
Figure 1.7 – FactoryTalk View SE Distributed HMI system
The preceding screenshot shows an example of a distributed Rockwell Automation FactoryTalk View Site Edition (SE)-distributed HMI application.
SCADA
SCADA is a term used to describe a combined use of ICS types and devices, all working together on a common task. The following screenshot shows an example SCADA network. Here, the SCADA network comprises all the equipment and components that together form the overall system:
Figure 1.8 – SCADA
As depicted in the preceding screenshot, SCADA systems can be spread out over a wide geographical area, being applied to the power grid, water utilities, pipeline operations, and other control systems that use remote operational stations.
DCS
Closely related to a SCADA system is the DCS. The differences between a SCADA system and a DCS are very small, and the two are becoming more indistinguishable all the time. Traditionally, though, SCADA systems have been used for automation tasks that cover a larger geographical area, whereas a DCS is more often confined to a single plant or facility. A DCS is often a large-scale, highly engineered system with a very specific task. It uses a centralized supervisory unit that can control thousands of I/O points. The system is built to last, with redundancy applied to all levels of the installation.
An example DCS is presented in the following screenshot:
Figure 1.9 – DCS
As depicted in the preceding screenshot, DCSes use redundant networks and network interfaces, attached to redundant server sets and connected to redundant controllers and sensors, all with the goal of creating a rigid and solid automation platform in mind. DCSes are most commonly found in water management systems, paper and pulp mills, sugar refinery plants, and so on.
The distributed nature of a DCS makes it more difficult to secure as it often has to break network section boundaries, and the shared amount of human interaction with the DCS creates a greater chance of malware infections.
SIS
SISes are dedicated safety monitoring systems. They are there to safely and gracefully shut down the monitored system or bring that system to a predefined safe state in case of a hardware malfunction. A SIS uses a set of voting systems to determine whether a system is performing normally. If a safety system is configured to shut down the process of a machine when unsafe conditions are detected, it is considered an Emergency Shutdown (ESD) system.
An example of an SIS is presented in the following screenshot:
Figure 1.10 – SIS
Safety systems were initially designed to be standalone and disconnected monitoring systems (think bolt-on, local device/system inspection), but the trend over the past years has been to start attaching them to the industrial network, adding an easy way of (re)configuring them but also exposing them to potential attacks with all the accompanying risks. An ESD could be misused by potential attackers. They could reconfigure the SIS to shut down the system to cause financial loss for the company, or instruct the SIS to not shut down when required as an aim to perform physical damage to the operation, with the disastrous side effect that people's lives are at stake.
Consider, for example, the TRITON attack/malware campaign that targeted SIS systems back in 2017:
https://www.nozominetworks.com/blog/new-triton-ics-malware-is-bold-and-important/#:~:text=The%20attack%20reprogrammed%20a%20facility%E2%80%99s%20Safety%20Instrumented%20System,impacted%20not%20just%20an%20ICS%2C%20but%20SIS%20equipment
The Purdue model for ICSes
So, how does all this tie together? What makes for a solid ICS architecture? To answer that question, we should first discuss the Purdue reference model—or Purdue model, for short. Shown in the next screenshot, the Purdue model was adopted from the Purdue Enterprise Reference Architecture (PERA) model by ISA-99 and is used as a concept model for ICS network segmentation. It is an industry-adopted reference model that shows the interconnections and interdependencies of all the main components of a typical ICS. The model is a great resource to start the process of figuring out a typical modern ICS architecture and is presented here:
Figure 1.11 – The Purdue model
The Purdue model divides the ICS into four distinct zones and six levels. The following sections will describe the zones and levels, combining the bottom two zones into the Industrial Zone.
The Enterprise Zone
The part of the ICS that business systems and users directly interact with resides in the Enterprise Zone.
This is depicted in the following screenshot:
Figure 1.12 – The Enterprise Zone
The Enterprise Zone can be subdivided into Level 5 (Enterprise Network) and Level 4 (Site Business Planning and Logistics). Note that not all companies' Enterprise Zones will necessarily have a Level 5, and some might combine levels 5 and 4.
Level 5 – Enterprise Network
The Enterprise Zone is the part of the network where business systems such as Enterprise Resource Planning (ERP) and Systems Applications and Products (SAP) typically live. Here, tasks such as scheduling and supply chain management are performed. The systems in this zone normally sit at a corporate level and span multiple facilities or plants. They take data from subordinate systems that are located out in the individual plants and use the accumulated data to report on overall production status, inventory, and demand. Technically not part of the ICS, the Enterprise Zone does rely on connectivity with the ICS networks to feed the data that drives business decisions.
Level 4 – Site Business Planning and Logistics
Level 4 is home to all the IT systems that support the production process in a plant or facility. These systems report production statistics such as uptime and units produced to corporate systems, and take orders and business data down from the corporate systems to be distributed among the OT or ICS systems.
Systems typically found in level 4 include database servers, application servers (web, report, the Manufacturing Execution System (MES)), file servers, email clients, supervisor desktops, and so on.
The IDMZ
Between the Enterprise Zone and the Industrial Zone lies the IDMZ, depicted in the following screenshot:
Figure 1.13 – The IDMZ
The IDMZ contains a single level: level 3.5.
Level 3.5 – The IDMZ
As the level number might imply, level 3.5 was added to the model later. It stems from the efforts taken to create security standards such as the National Institute of Standards and Technology (NIST) Cybersecurity Framework and North American Electric Reliability Corporation Critical Infrastructure Protection (NERC CIP). The IDMZ is an information-sharing layer between the business or IT systems in levels 4 and 5, and the production or OT systems in levels 3 and below. By preventing direct communication between IT and OT systems, but rather having a broker service in the IDMZ relay communications, an extra layer of separation and inspection is added to the overall architecture. Systems in the lower layers are not being exposed directly to attacks or compromise. If, at some point, something were to compromise a system in the IDMZ or above, the IDMZ could be shut down, the compromise contained, and production could continue.
Systems typically found in the IDMZ include (web) proxy servers, database replication servers, Network Time Protocol (NTP) servers, file transfer servers, Windows Server Update Service (WSUS) servers, and other transitional (broker) service servers. The IDMZ tends to be a virtual stack to allow flexibility when building broker services and implementing redundancy, failover, and easy restore functionality.
The Industrial Zone
At the heart (or bottom) of the ICS is the Industrial Zone; this is the part of the ICS environment we are trying to protect by shielding it off from the rest of the world. The ultimate goal is to have most of the user interactions occurring on the Enterprise network/zone, where systems can be more easily patched, monitored, and contained. Any traffic, data, or interactions that need to dribble down to production systems do so via tightly defined and well-configured methods (broker services—see later) in the IDMZ, and are shielded from directly manipulating the production and automation systems and devices.
The Industrial Zone is depicted in the following diagram:
Figure 1.14 – The Industrial Zone
The Industrial Zone consists of levels 3-0, explained in the next sections.
Level 3 – Site Operations
Level 3 is where systems reside that support plant-wide control and monitoring functions. At this level, the operator is interacting with the overall production systems. Think of centralized control rooms with HMIs and operator terminals that give an overview of all the systems that run the processes in a plant or facility. The operator uses these HMI systems to perform tasks such as quality control checks, managing uptime, and monitoring alarms, events, and trends.
Level 3, Site Operations, is also where the OT systems live that report back to IT systems in level 4. Systems in lower levels send production data to data collection and aggregation servers in this level, which can then send the data up to higher levels or can be queried by systems in higher levels (push versus pull operations).
Systems typically found in level 3 include database servers, application servers (web, report), file servers, Microsoft domain controllers, HMI servers, engineering workstations, and so on. These types of systems can be found on the Enterprise network as well, but here they interact with the production process and data. The Microsoft domain controller at Level 3, Site Operations, should be used to implement a standalone industrial domain and Active Directory that is in no way tied to the Enterprise domain. Any link from an Enterprise domain to the Industrial Zone can allow the propagation of attacks or malware from the Enterprise Zone down into the industrial environment.
Level 2 – Area Supervisory Control
Many of the functions and systems in level 2 are the same as for level 3 but are targeted more toward a smaller part or area of the overall system. In this level, specific parts of the system are being monitored and managed with HMI systems. Think along the lines of a single machine or skid with a touchscreen HMI to start or stop the machine or skid, and to see some basic running values and manipulate machine- or skid-specific thresholds and setpoints.
Systems typically found in level 2 include HMIs (standalone or system clients), supervisory control systems such as a line-control PLC, engineering workstations, and so on.
Level 1 – Basic Control
Level 1 is where all the controlling equipment lives. The main purpose of the devices in this level is to open valves, move actuators, start motors... Typically found in level 1 are PLCs, Variable-Frequency Drives (VFDs), dedicated proportional–integral–derivative (PID) controllers, and so on. Although you could find a PLC in level 2, its function there is of a supervisory nature instead of a controlling one.
Level 0 – Process
Level 0 is where the actual process equipment lives that we are controlling and monitoring from the higher levels. Also known as Equipment Under Control (EUC), level 1 is where we can find devices such as motors, pumps, valves, and sensors that measure speed, temperature, or pressure. As level 0 is where the actual process is performed and where the product is made, it is imperative that things run smoothly and in an uninterrupted fashion. The slightest disruption to a single device can cause mayhem to all operations.
IT and OT convergence and the associated benefits and risks
ICSes started their life as proprietary implementations of automation and controls equipment, often standalone but, where necessary, glued together with vendor-specific obscure communication media and protocols. It would take identically obscure methods and tools to reprogram these setups if reprogramming were even possible. At those early times of ICS, there was a clear distinction and a solid boundary between OT and IT, though over the past decade, that boundary has all but dissolved.
The difference between IT and OT
Operational Technology, or OT, encompasses everything needed to get a product out the door, a service delivered, or to perform any other form of production activity. In the most modern sense, typical OT equipment that falls under this term includes PLCs, HMIs, actuators, sensors, PCs, servers, network switches, and network media, but also the software that runs on or with these devices, such as operating systems, firmware, applications, software, and databases. In contrast, the term Information Technology, or IT, encompasses all the equipment and processes involved with storing, retrieving, and sending information. Some typical IT equipment includes PCs, servers, network switches, network media, firewalls, other security appliances, and the various software, firmware, databases, and so on that run on and with the IT equipment.
Right away, you can see how the two terms have an overlapping area when IT equipment and processes are used as part of the OT environment. As we will discuss later in this chapter, this stems from the convergence of OT and IT that has been progressing over the past years.
Where IT and OT truly differ is in the way they are being utilized and implemented. Whereas IT equipment hardly ever physically controls anything (make something move; heat up; react…), that is the sole purpose of OT equipment. IT is part of the OT environment so that there are information capabilities available for the OT processes to react on, record to, or process changes from. If, on a purely IT system, that data becomes missing (as a result of corruption or malicious actions), a service might fail, a web portal might become unresponsive, or some other form of service interruption might occur. However, the consequences will be limited to some monetary loss, such as missed revenue or degraded Service-Level Agreement (SLA) numbers. On the OT side, however, if the data that a process relies on to properly function goes missing or becomes corrupted, the physical process can become unstable and the consequences can include physical damage to the environment or a threat to public health and safety, or can even be deadly for people operating the machinery that the OT equipment is controlling.
One final difference between IT and OT I want to touch on is the way they are being protected—the area users of the technology find most important. It is well known that the security paradigm for IT is the Confidentiality, Integrity, and Availability (CIA) triad. Not strictly a rule, but for IT systems, confidentiality (hiding data from prying eyes) is the most important factor to secure, followed by integrity (making sure data does not get tampered with), and the least important asset is availability (uptime). With OT systems, that triad is turned around—availability is the most important concern of the owners of an OT system, followed by integrity, and—finally—confidentiality.
The following screenshot depicts the CIA security triad:
Figure 1.15 – The CIA security triad
Knowing that some of the processes an OT/ICS system controls are expected to run flawlessly and uninterrupted for weeks—or even months—at a time, it is easy to see why here, the availability requirement is king. Integrity is a close second, as we want to make sure that data that the OT systems and operators are making decisions on is free from error or manipulation. Confidentiality is hardly a major concern in a typical OT environment, other than maybe with some historical data stored in a database or in log files on a server or PC. This is because by the time the data is used, it is pretty much worthless already; though as I said, stored production data, recipes, and—in some cases—the control applications that are stored on storage media do have some value, and therefore the C in the CIA triad should not be completely ignored.
The opportunities
Why did IT and OT convergence occur? Wouldn't it make much more sense to keep the two separated? Yes—from a security perspective, it would definitely make sense to keep things as separate as possible. However, from a business perspective, having accurate, on-the-fly, and relevant data coming from the OT environment makes a lot of sense. Such information allows tighter production scheduling, can decrease the amount of inventory that needs to be held on site, helps cost calculation, and provides many more logistical advantages. Modern ERP and MES systems rely on input and information from both the production and the enterprise side of a business. Those reasons—and many more—have driven the convergence of IT and OT systems.
The risk
As stated earlier in this chapter, the ICS (the OT environment) was originally built with, and around, proprietary devices, equipment, and networking media and protocols, without security in mind. Just about every vendor had their own way of doing things, and every one of them had a different way of configuring, operating, and maintaining their setup. This proprietary behavior did not work well with the whole IT/OT convergence demand, and slowly ICS equipment vendors started adhering to a common set of standards—namely, the widely used networking protocols Ethernet, Internet Protocol (IP), Transport Control Protocol (TCP), and the User Datagram Protocol (UDP)—to run their controls and automation protocols over.
In order not to have to reinvent the wheel, many vendors layered their well-established controls and automation protocols on top of the TCP or UDP protocol. This way, all that was necessary to hop onto the IT/OT convergence train was to slap in an IP/TCP/UDP-capable communications module, and they were set to go. Seeing as most control systems are modular in nature, whereby the communications devices are separate from the central processing unit (CPU) (controller), this was an easy swap. Now, customers could more easily adapt to a standard that allowed a common set of technologies to wire up the entire production facility by the same type of wires, switches, and even the same skill set of the person doing the install.
And this is what many companies literally did—they wired the entire network, stretching from the production area, up through the offices, over the Wide Area Network (WAN) up to other plants or a corporate office, and sometimes even all the way onto the internet. This made it tremendously easy to get data, troubleshoot, and gain accessibility, but it also opened up these previously hidden devices and equipment—often running the most sensitive parts of a business—to attack. Controls and automation devices now have an IP address; they can be accessed from anywhere because they use the IP protocol, and they can be easily interpreted because the controls and automation protocols that ride on top of IP and TCP are just about all cleartext protocols.
Have a look at the following screenshot:
Figure 1.16 – ICS cleartext protocol packet
As the preceding screenshot shows, a Modbus packet (for example) does not hide the data or function codes it is relaying between nodes. This allows for inspection and manipulation of the data and function codes. Additionally, with the adoption of the TCP/IP stack of automation equipment, they now can be targeted by common IT techniques and tools.
Even today, most of the ICS equipment is still using the wide open and cleartext protocols that were designed as part of the proprietary protocol. Many companies have, however, moved away from a flat network connecting every device from all over the organization. We will discuss this in more detail in a later chapter.
If you want to see the inherent insecurity of controls and automation protocols in action and even play with some attacks yourself, I refer you to Chapter 2 of this book's first edition – Insecure by Inheritance. This chapter has detailed explanations and attack examples for the Modbus, Siemens S7, Profinet, and Ethernet/IP protocols.
Example attack on the Slumbertown papermill
By illustrating a cyber attack on a fictitious papermill, in the rolling hills of Slumbertown, Chapter 3 of the first edition of this book – The Attack walked us through a possible attack scenario whereby attackers infiltrated the organization and managed to create havoc in the production process.
Have a look at the following screenshot:
Figure 1.17 – The Slumbertown papermill ICS network architecture
The preceding screenshot is a depiction of the Slumbertown papermill ICS network that attackers managed to infiltrate and wreak havoc on. Next is a summary of the approach taken by the attackers.
Attack recap
From a high-level perspective, we will now look at the steps the attackers took to reach their objective.
Reconnaissance
The first step for an attacker is to gather as much information about their target as possible. This is called reconnaissance. Here, the attackers learned as much as they could about the operation of the papermill, the personal life of its employees, and the technology used in the ICS environment before they started their attack.
Spear phishing
The attackers crafted an enticing phishing email that they tailored in such a way as to get Mark to click on a malicious link that would ultimately compromise the computer Mark was logged in on, allowing the attackers to take a foothold on the papermill's enterprise network.
You can see the phishing email here:
Figure 1.18 – The email that started it all...
Next, the attackers started to probe around, looking for other victims to compromise.
Lateral movement on the Enterprise network
Once the attackers had a foothold on the Enterprise network of the papermill (the network that directly connects to the internet and has business-type clients with email and internet access on it), they started looking around for additional systems to compromise. They did this by using Mark's computer as a pivot into the network and running their probes and scans that way.
Using this method, they found and compromised other interesting computers, and ultimately made their way onto a system that due to being "dual homed" (connected to two networks: the enterprise network and the industrial network) allowed them access to the industrial environment.
Attacking the industrial network
With access to a workstation that is connected to the Industrial (production) network, the attackers were ready to start the true objective of their attack—phase 2: interruption of the papermill digester process, with the ultimate goal of causing physical damage. They achieved this objective by manipulating the cleartext packets sent from the control process to the operator screen. By changing the values that were presented to the operator, they tricked that operator into taking a corrective action that ultimately resulted in overpressurizing the digester…
Not much has changed
In the time between the release of the first edition of this book and the writing of this edition, just about every major ICS-centric compromise has followed the aforementioned process. The end goal of the attackers might have been different, but the steps taken to get there will have been pretty much the same.
If you want to read a detailed description of how the attack took place, and even follow along with the attack activities, head on over to Chapter 3 of the first edition of this book – The Attack.
The comprehensive risk management process
Securing the ICS environment ultimately comes down to managing risk. By identifying risk, categorizing risk, prioritizing risk, and ultimately mitigating risk, the ICS security posture is improved. The four major categories involved with risk management, as explained in detail in Chapter 4 of the first edition of this book – Industrial Control System Risk Assessments, are outlined next.
1. Asset identification and system characterization
Under the motto "you cannot secure and protect what you do not know you have", the first—and arguably, most important—step to risk management is getting an accurate index of all your assets in the ICS environment. This can be a manual process whereby you open each and every electrical cabinet, look inside every network closet and panel, and inventory every desktop in your production facility. However, an automated approach might be easier and more comprehensive while also less error-prone.
Tools such as the open source grassmarlin (https://github.com/nsacyber/GRASSMARLIN), or one of the paid-for ICS-specific Intrusion Detection System (IDS) solutions (CyberX, Claroty, Nozomi, Forescout, Indegy, PAS Global LLC…) can passively index your assets by sniffing the network. Although these tools do a fantastic job, they can miss an asset if it is in a tough part of the network or somehow otherwise out of reach of the aforementioned tools (offline). I suggest using a combination of sniffing tools, an off-hours scan with a properly configured Nmap scan, and some elbow-grease work of manually inventorying and indexing to get the best results.
After you have made a list of all the assets you have in the ICS environment, details such as operating system version, firmware revision, patch level, software inventory, and running services on must be added to the list, as well as a criticality scoring and a value for the asset. A criticality scoring for an asset is a way to identify how important, valuable, and integral the asset is to the overall production process or the survivability of the organization. Criticality scoring will be discussed in detail in Chapter 15, Industrial Control System Risk Assessments. These asset details will help assess proper risk scoring and will ultimately allow for intelligent prioritization of risk mitigation.
2. Vulnerability identification
After a list of assets with accompanying software, firmware, and operating system patch levels and revisions is assembled, the next step is to compare these revisions, versions, and patch levels against known vulnerabilities for them. This can be a manual process where you use a website such as the National Vulnerability Database (https://nvd.nist.gov/) to look up every piece of information and compare it to their database of known vulnerabilities.
A more manageable approach would be to run an automated vulnerability scan with Nessus or Qualys. An automated vulnerability scan is faster and often more reliable, as it can find—and sometimes even verify—a large set of known vulnerabilities, as well as check for common misconfiguration or default (weak) settings. Be warned that a vulnerability scan is intense and can cause ICS equipment to buckle under the additional network traffic. I highly recommended running a scan such as this during production downtime, though be prepared to verify the ICS equipment works as expected afterward.
3. Threat modeling
Now that we know what we have (asset list) and what is wrong with it (asset vulnerabilities), the next step is to see how likely the discovered vulnerabilities in our assets are to be exploited, and what the potential impact and consequence of successful exploitation would be. The process that helps us define this is called threat modeling. Threat modeling uses risk scenarios to define possible threat events and the impact and consequence of a threat event. For a threat event to be feasible, the following elements must be present: a threat source to carry out the event; a threat vector to exploit the vulnerability; and a target with a vulnerability. In a way, creating risk scenarios is about trying to predict where a threat is most likely going to target and strike. The following screenshot conceptualizes a risk scenario:
Figure 1.19 – Depiction of a risk scenario
Having a matrix of risk scenarios allows us to make an educated decision on which threats are more concerning than others and therefore allows us to prioritize and streamline remediation, giving us a better return of investment for the limited security budget that we have.
Important note
Additionally, to help define the likelihood of a threat event unfolding, you can perform a penetration test as part of the risk assessment. In short, a penetration test will take the created risk scenarios and try to actualize them by attacking the vulnerabilities within the confines of the risk scenario. Needless to say, penetration testing should not be performed on live ICS environments! A test environment or an approximation of the ICS environment should be built and used to run the penetration-testing activities on.
4. Risk calculation and mitigation planning
Now that we have a very clear picture of the possible risk scenarios for our ICS environment, we can next quantify the risk by assigning a risk score to every risk scenario we have created. By correlating the assessment process between assets and having cross-assessed every asset, the scoring will be a relative number showing where best to spend mitigation efforts and money to create the best return on investment, and indicating where our efforts will have the most impact.
For the scoring, we can use the following formula (others exist and can be used, as long as you are consistent):
As an example, this formula gives us the following risk score for a Siemens S7-400 PLC vulnerability:
Important note
To complement the risk assessment process that is described in detail in the first edition, this book will go into painful detail on the penetration testing process.
The DiD model
The idea behind the DiD model is that by stacking defenses, with the idea that multiple backup security controls cover each other, a holistic and all-encompassing security posture is created for the entire ICS network.
The DiD model is presented in the following diagram:
Figure 1.20 – The DiD model
The several layers of the DiD model are briefly explained next. Chapters 6 through 11 of the first edition explain these layers in detail.
Policies and procedures
No security program is complete without proper direction. Policies and procedures do just that. They provide a way for management to give direction to the security program and portray the vision and objective of the security program.
Physical security controls
Limit physical access to authorized personnel: cells/areas, control panels, devices, cabling, and control room…; locks, gates, key cards, and biometrics. This may also include administrative controls such as policies, procedures, and technology to escort and track visitors.
Network security controls
Controls that fall into this layer are aimed at defending the ICS network and the devices that sit on this network. Some controls include firewall policies, access control list (ACL) policies for switches and routers, Authentication, Authorization and Accounting (AAA), IDSes, and intrusion prevention systems (IPSes).
Computer security controls
Controls within this layer are aimed at protecting and hardening the computer system and include patch management, endpoint protection solutions, the removal of unused applications/protocols/services, closing unnecessary logical ports, and protecting physical ports.
Application security controls
Controls within this layer aim to add controls at the application level of the ICS. The application level is where the end users interact with the system through application programming interfaces (APIs), portals, and other interfaces. Controls at this layer include AAA methods and solutions.
Device-level security controls
Controls in this layer are aimed at protecting the ICS device and include device patching, device hardening, physical and logical access restrictions, and setting up a device life cycle program that involves defining procedures for device acquisition, implementation, maintenance change management, and device disposal.
ICS security program development
Security planning and security program development, including governance to define the policies and procedures for your unique environment and situation, should be a well-thought-out exercise, performed before any other security task. Before embarking on any kind of security activity, you should make a plan that fits your company's goals, needs, and requirements. Without the proper planning and guidance, implementing security becomes aimed at a moving target.
Security program development and management
To be able to effectively integrate security into an ICS, we must define and execute a comprehensive cybersecurity program that addresses all aspects of security. The program should range from identifying the objectives of the program to the day-to-day operation and ongoing auditing and verification of the program and accompanying security posture for compliance and improvement purposes. An organization's business objectives should include a cybersecurity program, and the security program should be aligned with the organization's business objectives. This is paramount for the overall success of a security program.
Items to consider while setting up an industrial cybersecurity program include the following:
- Obtaining senior management buy-in
- Building and training a cross-functional team
- Defining the charter and scope
- Defining specific ICS policies and procedures
- Implementing an ICS security risk management framework
— Defining and inventorying ICS assets
— Developing a security plan for ICS systems
— Performing a risk assessment
— Defining the mitigation controls
- Providing training and raising security awareness for ICS staff
- Rinse and repeat—meaning you must indefinitely monitor, correct, and refine your security program to stay accurate, up to date, and effective
Risk management (cyclic activities to find and mitigate risk)
The following screenshot depicts the process and corresponding activities around the continuous (cyclic) industrial cybersecurity improvement process:
Figure 1.21 – The cyclic cybersecurity improvement process
Keeping an ICS security program and accompanying risk management activities accurate and up to date requires a cyclic sequence of activities.
These activities are outlined here:
- Assessing risk: To verify the completeness of the applied security controls and mitigation and to assess against the newest standards and policies, recurring risk assessment should be scheduled. The assessment can become increasingly more involved as the overall security program evolves, to uncover more detailed and harder-to-spot vulnerabilities. A risk assessment should be completed once a year, at a minimum.
- Responding to identified risk: As risk is detected by a monitoring system or revealed by a risk assessment; it must be addressed by a (dedicated) team.
- Monitoring risk evolvement and mitigation: Monitoring risk is geared around keeping track of mitigation efforts on issues found during a risk assessment or discovered by a monitoring system such as an endpoint security client or an IDS/IPS sensor.
Takeaway from the first edition
ICSs have evolved over the past few decades from standalone islands of automation to entire networks of automation devices, computer and server systems, and the media connecting them. Nowadays, IT and OT equipment is used in an intertwined fashion to perform a specific business goal such as building a product, supplying a service, or maintaining an environmental variable such as temperature, humidity, or stability. An ICS has become the backbone of almost every industry and will cause severe consequences to the uptime, productivity, and profitability of a company when it becomes unavailable, as well as possibly causing environmental and physical damage and even resulting in bodily harm or death if tampered with.
This extreme dependency on an ICS's reliable functioning, coupled with the added exposure to cybersecurity threats resulting from IT and OT convergence, makes safeguarding a proper cybersecurity posture of every ICS owner a matter of due diligence.
The combination of the tremendously high impact of compromise to an ICS and the ability to achieve this remotely using standard IT malware has caused ICS cyber attacks to proliferate over the past two decades. We are all aware of high-impact cyber attacks on critical infrastructure carried out (allegedly) by nation-state actors—nuclear facilities, power grids, oil industry. None of these are immune.
In the first edition of this book, we learned how an attacker will go about infiltrating, exploiting, and taking over an ICS environment. We learned the tools and techniques used for this process, as well as looking at the underlying issues and inherent weaknesses around how ICS equipment operates that allow these tools and attacks to successfully compromise an ICS environment. The first edition then went and showed the concepts, ideas, and fundamentals necessary to understand what it takes to secure an ICS environment, covering topics such as DiD, security program development, and risk management.
As a summary, we will look at the four main tasks or responsibilities that should be considered/covered to successfully establish a well-functioning and effective ICS cybersecurity program.
Know what you have
Having an up-to-date, complete, and accurate inventory of assets that comprise your ICS is arguably the most important step in a cybersecurity program. You cannot secure and protect what you don't know you have.
This task requires a well-planned and effective asset management program.
Know what is wrong with what you have
You then have to know what is wrong with the assets that you have. How else are you going to fix it?
This task requires you to define a comprehensive vulnerability management program.
Fix or defend what you know is wrong
Once you have identified what is wrong with the assets that you have, you must make a mitigation plan that is targeted and complete and that gives you the best return of investment while tackling the undoubtedly overwhelming amount of risk to deal with.
The task requires you to set up a complete and detailed risk management program.
Rinse and repeat indefinitely
To keep your ICS cybersecurity activities and management programs up to date and effective, you need to periodically review the processes, activities, and implemented solutions and controls for completeness, effectiveness, and relevance.
This task requires a recurring sequence of events to be defined, tying all cybersecurity programs and activities together in a never-ending loop of assess, respond, and monitor…
The remainder of this book will be dedicated to the technologies, techniques, concepts, activities, and responsibilities for monitoring the security of the ICS environment, or security monitoring for short.