Introduction to cryptography
Techniques to protect secrets have been around for centuries. The earliest attempts to secure and hide data from adversaries date back to ancient inscriptions discovered on monuments in Egypt, where a special alphabet that was known by only a few trusted people was used. This early form of security is called obscurity and is still used in different forms today. In order for this method to work, it is critical to protect the secret, which would be the secret meaning of the alphabet in the above example. Later in time, finding foolproof ways of protecting important messages was important in both World War One and World War Two. In the late 20th century, with the introduction of electronics and computers, sophisticated algorithms were developed to secure data, giving rise to a whole new field called cryptography. This chapter discusses the algorithmic aspects of cryptography. One of the uses of these algorithms is to allow secure data exchange between two processes or users. Cryptographic algorithms find strategies for using mathematical functions to ensure the stated security goals.
First, we will look at the importance of “the weakest link” in the infrastructure.
Understanding the importance of the weakest link
Sometimes, when architecting the security of digital infrastructure, we put too much emphasis on the security of individual entities and don’t pay the necessary attention to end-to-end security. This can result in us overlooking some loopholes and vulnerabilities in the system, which can later be exploited by hackers to access sensitive data. The important point to remember is that a digital infrastructure, as a whole, is only as strong as its weakest link. For a hacker, this weakest link can provide backdoor access to sensitive data in the digital infrastructure. Beyond a certain point, there is not much benefit in fortifying the front door without closing all the back doors.
As the algorithms and techniques for keeping digital infrastructure become more and more sophisticated, attackers keep upgrading their techniques as well. It is always important to remember that one of the easiest ways for attackers to hack digital infrastructure is by exploiting these vulnerabilities to access sensitive information.
In 2014, a cyber attack on a Canadian federal research institute—the National Research Council (NRC)—is estimated to have cost hundreds of millions of dollars. The attackers were able to steal decades of research data and intellectual property material. They used a loophole in the Apache software that was used on the web servers to gain access to the sensitive data.
In this chapter, we will highlight the vulnerabilities of various encryption algorithms.
Let’s first look at the basic terminology used.
The basic terminology
Let’s look at the basic terminology related to cryptography:
- Cipher: An algorithm that performs a particular cryptographic function.
- Plain text: The plain data, which can be a text file, a video, a bitmap, or a digitized voice. In this chapter, we will represent plain text as P.
- Cipher text: The scrambled text that is obtained after applying cryptography to the plain text. In this chapter, we will represent this as C.
- Cipher suite: A set or suite of cryptographic software components. When two separate nodes want to exchange messages using cryptography, they first need to agree on a cipher suite. This is important in ensuring that they use exactly the same implementation of the cryptographic functions.
- Encryption: The process of converting plain text, P, into cipher text, C, is called encryption. Mathematically, it is represented by encrypt(P) = C.
- Decryption: The process of converting cipher text back into plain text. Mathematically, it is represented by decrypt(C) = P.
- Cryptanalysis: The methods used to analyze the strength of cryptographic algorithms. The analyst tries to recover the plain text without access to the secret.
- Personally Identifiable Information (PII): PII is information that can be used to trace an individual’s identity when used alone or with other relevant data. Some examples include protected information, such as a social security number, date of birth, or mother’s maiden name.
Let us first understand the security needs of a system.
Understanding the security requirements
It is important to first understand the exact security needs of a system. Understanding this will help us use the correct cryptographic technique and discover the potential loopholes in a system.
One way of developing a better understanding of the security needs of a system is by answering the following four questions:
- Which individuals or processes need to be protected?
- Who are we protecting the individuals and processes from?
- Where should we protect them?
- Why are we protecting them?
Let us take the example of a Virtual Private Cloud (VPC) in the AWS cloud. A VPC allows us to create a logical isolation network where resources like virtual machines are added to it. In order to understand the security requirements of a VPC, it is important to first identify the identities by answering those four questions:
- How many individuals are planning to use this system?
- What sort of information needs to be protected?
- Should we protect the VPC only, or we are passing a message to the system that needs to be encrypted and communicated to the VPC?
- What is the security classification of the data? What are the potential risks? Why would anyone have an incentive to try to hack the system?
Most of the answers to these questions will come by performing the following three steps:
- Identify the entities.
- Establish the security goals.
- Understand the sensitivity of the data.
Let’s look at these steps one by one.
Step 1: Identifying the entities
An entity can be defined as an individual, a process, or a resource that is part of an information system. We first need to identify how users, resources, and processes are present at runtime. Then, we will quantify the security needs of these identified entities, either individually or as a group.
Once we better understand these requirements, we can establish the security goals of our digital system.
Step 2: Establishing the security goals
The goal of designing a security system is to protect information from being stolen, compromised, or attacked. Cryptographic algorithms are typically used to meet one or more security goals:
- Authentication: Authentication is a mechanism by which we ascertain the identity of a user, device, or system, confirming that they are indeed what or who they claim to be.
- Authorization: Authorization is the process of giving the user permission to access a specific resource or function.
- Confidentiality: Data that needs to be protected is called sensitive data. Confidentiality is the concept of restricting sensitive data to authorized users only. To protect the confidentiality of sensitive data during its transit or in storage, you need to render the data so that it is unreadable except by authorized users. This is accomplished by using encryption algorithms, which we will discuss later on in this chapter.
- Integrity: Integrity is the process of establishing that data has not been altered in any way during its transit or storage. For example, TCP/IP (Transmission Control Protocol/Internet Protocol) uses checksum or Cyclic Redundancy Check (CRC) algorithms to verify data integrity.
- Non-repudiation: Non-repudiation is the ability to produce unforgeable and irrefutable evidence that a message was sent or received. This evidence can be used later to prove the receipt of data.
Step 3: Understanding the sensitivity of the data
It is important to understand the classified nature of data. Data is categorized by regulatory authorities such as governments, agencies, or organizations based on how serious the consequence will be if it is compromised. The categorization of the data helps us choose the correct cryptographic algorithm. There is more than one way to categorize data, based on the sensitivity of the information it contains. Let’s look at the typical ways of classifying data:
- Public data or unclassified data: Anything that is available for consumption to the public, for example, information found on a company’s website or a government’s info portal.
- Internal data or confidential data: Although not for public consumption, exposing this data to the public may not have damaging consequences. For example, if an employee’s emails complaining about their manager are exposed, it may be embarrassing for the company but this may not have damaging consequences.
- Sensitive data or secret data: Data that is not supposed to be for public consumption and exposing it to the public could have damaging consequences for an individual or an organization. For example, leaking the details of a future iPhone may harm Apple’s business goals and could give an advantage to rivals, such as Samsung.
- Highly sensitive data: Also called top-secret data. This is information that, if disclosed, would be extremely damaging to the organization. Examples of highly sensitive data include proprietary research, strategic business plans, or internal financial data.
Top-secret data is protected through multiple layers of security and requires special permission to access it.
In general, more sophisticated security designs are much slower than simple algorithms. It is important to strike the right balance between the security and the performance of the system.
Understanding the basic design of ciphers
Designing ciphers is about coming up with an algorithm that can scramble sensitive data so that a malicious process or an unauthorized user cannot access it. Although, over time, ciphers have become more and more sophisticated, the underlying principles that ciphers are based on remain unchanged.
Let’s start by looking at some relatively simple ciphers that will help us understand the underlying principles that are used in the design of cryptographic algorithms.
Presenting substitution ciphers
Substitution ciphers have been in use for hundreds of years in various forms. As the name indicates, substitution ciphers are based on a simple concept—substituting characters in plain text with other characters in a predetermined, organized way.
Let’s look at the exact steps involved in this:
- First, we map each character to a substitute character.
- Then, we encode and convert the plain text into cipher text by replacing each character in the plain text with another character in the cipher text using substitution mapping.
- To decode, we bring back the plaintext by using substitution mapping.
The following are examples of substitution-based ciphers:
- Caesar cipher
- Rotation 13
Let us look into them in more detail.
Caesar cipher
Caesar ciphers are based on substitution mapping. Substitution mapping changes the actual string in a deterministic way by applying a simple formula that is kept secret.
The substitution mapping is created by replacing each character with the third character to the right of it. This mapping is described in the following diagram:
![Diagram
Description automatically generated](https://static.packt-cdn.com/products/9781803247762/graphics/Images/B18046_14_01.png)
Figure 13.1: The substitution mapping of Caesar ciphers
Let’s see how we can implement a Caesar cipher using Python:
rotation = 3
P = 'CALM'; C=''
for letter in P:
C = C+ (chr(ord(letter) + rotation))
We can see that we applied a Caesar cipher to the plaintext CALM
.
Let’s print the cipher text after encrypting it with the Caesar cipher:
print(C)
FDOP
Caesar ciphers are said to have been used by Julius Caesar to communicate with his advisers.
A Caesar cipher is a simple cipher and is easy to implement. The downside is that it is not too difficult to crack as a hacker could simply iterate through all the possible shifts of the alphabet (all 2626 of them) and see if any coherent message appears. Given the current processing abilities of computers, this is a relatively small number of combinations to do. It should not be used to protect highly sensitive data.
Rotation 13 (ROT13)
ROT13 is a special case of the Caesar cipher where the substitution mapping is created by replacing each character with the 13th character to the right of it. The following diagram illustrates this:
![A picture containing rectangle
Description automatically generated](https://static.packt-cdn.com/products/9781803247762/graphics/Images/B18046_14_02.png)
Figure 14.2: Workings of ROT13
This means that if ROT13()
is the function that implements ROT13, then the following applies:
rotation = 13
P = 'CALM'; C=''
for letter in P:
C = C+ (chr(ord(letter) + rotation))
Now, let’s print the encoded value of C
:
print(c)
PNYZ
ROT13 is actually not used to accomplish data confidentiality. It is used more to mask text, for example, to hide potentially offensive text. It can also be used to avoid giving away the answer to a puzzle, and in other similar use-cases.
Cryptanalysis of substitution ciphers
Substitution ciphers are simple to implement and understand. Unfortunately, they are also easy to crack. Simple cryptanalysis of substitution ciphers shows that if we use the English language alphabet, then all we need to determine to crack the cipher is how much we are rotating by. We can try each letter of the English alphabet one by one until we are able to decrypt the text. This means that it will take around 25 attempts to reconstruct the plain text.
Now, let’s look at another type of simple cipher—transposition ciphers.
Understanding transposition ciphers
In transposition ciphers, the characters of the plain text are encrypted using transposition. Transposition is a method of encryption where we scramble the position of the characters using deterministic logic. A transposition cipher writes characters into rows in a matrix and then reads the columns as output. Let’s look at an example.
Let’s take the Ottawa Rocks
plain text (P).
First, let’s encode P. For that, we will use a 3 x 4 matrix and write in the plaintext horizontally:
O |
t |
t |
a |
w |
a |
R |
o |
c |
k |
s |
The read
process will read the string vertically, which will generate the cipher text—OwctaktRsao
. The key would be {1,2,3,4}, which is the order in which the columns are read. Encrypting with a different key, say, {2,4,3,1}, would result in a different cipher text, in this case, takaotRsOwc
.
The Germans used a cipher named ADFGVX in the First World War, which used both transposition and substitution ciphers. Years later, it was cracked by George Painvin.
So, these are some of the types of ciphers. In general, ciphers use a key to code plain text. Now, let’s look at some of the cryptographic techniques that are currently used. Cryptography protects a message using encryption and decryption processes, as discussed in the next section.