Intelligence data processing
Raw data holds no meaning until it is converted into useful information that the organization can use. Data is seen as the new oil, which means every organization collects a fair amount of data in various forms. Security companies collect big data in terms of logs, scans, assessments, and statistics. This step aims to process and format the big, collected data into a readable or easy-to-understand arrangement. However, it is difficult, if not nearly impossible, for an analyst to manually or singlehandedly mine the data that's been collected to build intelligence effectively. Therefore, processing the collected data needs to be automated by using intelligence platforms. This will be covered in detail in Chapter 5, Goals Setting, Procedures for the CTI Strategy, and Practical Use Cases.
There are several intelligence frameworks and structured models that can be used to process intelligence data dynamically. During the processing task, the analyst uses one or more frameworks or structures to organize the data into different buckets or storage units. Imagine a bank being targeted by several adversaries simultaneously; it is unlikely for threat analysts to detect and prevent all those threats manually. Structured models and frameworks help identify patterns in the data and identify intersection points between the different sources to understand how the adversaries operate effectively.
Security information and event management (SIEM) tools are mostly used to facilitate intelligence data processing and exploration. SIEM will be studied in detail in Chapter 12, SIEM Solutions and Intelligence-Driven SOCs. These tools provide a holistic view of the entire security system by correlating data from different sources. They are a great starting point for data processing and transformation. However, intelligence platforms and frameworks also allow us to perform intelligence data processing and exploration, especially when dealing with unstructured data from different sources or different vendors. Currently, some platforms support machine learning to identify threats in the data. Frameworks such as MITRE ATT&CK, Diamond model, and Kill Chain can all be used to process intelligence data smartly.
Using the Diamond model and the example provided in the previous section, a cyber threat intelligence SIEM can model the described threat in terms of four components: the adversary (the threat creator), the victim of the trojan (the employee and the system where it is implanted), the tactics and techniques used by the adversary to compromise the system, and the way the threat accessed the system (through an email attachment). The model correlates these four pieces and extracts commonalities to profile the adversary and initiate the appropriate actions.
The MITRE ATT&CK framework would focus more on the adversary's tactics and techniques and identify the threat's impact on the system. The most typical components that the framework extracts include the method used by the malware to access the system (in our case, an email, also known as phishing), the execution method (through double-clicking), the capabilities of the threat (privilege escalation, persistence nature, credentials theft, and so on), its direct impact on the system, and more.
In both cases, we can notice that both frameworks correlate different data to gain structured, meaningful information. For example, to understand that the initial access was done through phishing, it is vital to have email-related data (links, sender, attachments, receiver, attached IP address, domain, and so on), which can help the organization pivot through different data sources to analyze the threat. A link can already be established between data collection (what data is available or being collected) and processing.
The processing phase also addresses the storage problem. Since a lake of raw intelligence data is created in step 2 (intelligence data collection), a warehouse of processed data needs to be built in step 3 (intelligence data processing). The CTI team should be able to store the data effectively so that information can be accessed and retrieved easily as required. Specific CTI platforms, as we will see in Chapter 3, Cyber Threat Intelligence Frameworks, provide fast storage capabilities. Depending on the objectives and set requirements, an organization can choose to store processed intelligence information in the cloud or on-premises. It is crucial to evaluate and select the right approach from the early phases (step 1, planning and direction).
Another important feature to consider when selecting a CTI framework is the capability to process data in different languages. This can be a deciding point when setting and integrating an intelligence project. It allows the CTI team or analyst to go beyond the language barrier.
In this step, the CTI team or analyst must set up the tools, frameworks, and platforms that efficiently process raw intelligence data and store the information in an easy-to-access and easy-to-retrieve repository (considering the capabilities of the underlying tools).