Getting a primer
Vulnerability analysis and software exploitation are related and well-known topics in the area of cybersecurity. The purpose of this book is to look for security bugs in embedded firmware through emulation and later search for a way to exploit (take advantage of) these vulnerabilities. There are various types of security flaws. The most known and often exploitable bug is known as the buffer overflow, where an incorrect bound check makes a program buffer and becomes filled with user-provided data, and in some cases allows that user to execute code inside of the process memory. In the cybersecurity world, the code that’s injected and run through the exploitation of that vulnerability is known as shellcode. While it’s possible to run a shell to run commands, this isn’t always the only option, as it’s possible to be creative and execute different codes to put a foot inside of a machine.
Not all bugs are created equal
A bug is a software flaw. In many cases, bugs do not lead to security breaches or exploits. They just exhibit a behavior that is not expected by the user or the developer. In other cases, a bug may also be a software vulnerability, meaning that it may generate security issues, such as data leakages, denial of service, or exploitation. Exploiting a vulnerability normally leads to privilege escalation or to taking control of the CPU to execute arbitrary code.
Since the first document that explained this process was published (http://phrack.org/issues/49/14.html#article), many countermeasures have been created to stop an attacker who could exploit a vulnerability if one was found in a program. Protections help us avoid massive exploitations of buffer overflow vulnerabilities. However, many other flaws exist:
- Program logic errors (a mistake during the development phase can a cause program to end in an undefined/unexpected state)
- Buffer overread (where an improper bound check allows an attacker to have access to unauthorized program data)
- Format string vulnerabilities (https://www.win.tue.nl/~aeb/linux/hh/formats-teso.html)
- Heap overflow (an evolution of the buffer overflow in the heap), and many other kinds of vulnerabilities
While the process of searching for these vulnerabilities is hard and tedious due to the time it can take to manually find one, there are different techniques to help security researchers automatically discover some types of vulnerabilities, and in the case of this book, we will cover those that involve the use of a tool called a fuzzer. These kinds of tools take advantage of vulnerabilities such as the incorrect handling of user-provided data in programs to find an input that makes a program crash. The fuzzer will then run the program, giving different inputs and monitoring them to know when that program crashes. To improve the success of the fuzzing process, these programs take a set of inputs and mutate them (for example, changing some bits in the case of some file structures) to give a weird input to the program that it will not be able to handle and will make it crash, where this could or couldn’t be used to take advantage of the vulnerability (sadly, not all vulnerabilities are exploitable).
The utility belt
We have already roughly mentioned what we’ll see in each part of this book, as well as what tools we will use throughout. We will use this section to move a step forward and provide a better overview of the tools we will use, as well as install them (we will not deep-dive into these tools as they will be part of future chapters).
Git, Python3, build-essential
Git is a software version control system that helps keep track of code modifications, which allows us to store our code in a remote server. One of the main servers that contains Git repositories is GitHub. Everybody can upload their artifacts and share them with other people.
Python was created in 1991 by Guido Van Rossum and has exploded as a prototyping language in the last decade thanks to the myriad of libraries written in this language. Without any doubt, Python represents a milestone in computer science because it made programming accessible and readable to everyone. The build-essential
package is a basic collection of packages that help compile software in Ubuntu/Debian Linux distributions. Often, Python3 comes already installed and git can be installed with a package manager; for example:
- Arch:
pacman -S git python3 make gcc
cmake g++
- Debian/Ubuntu:
apt-get install git
python3 build-essential
- RHEL/CentOS:
yum install git python3 make gcc
cmake g++
- Also, for build essentials in RHEL/CentOS, you can use
dnf group install "C Development Tools and Libraries" "
Development Tools"
- Also, for build essentials in RHEL/CentOS, you can use
- SUSE:
zypper install git python3 make gcc
cmake g++
QEMU
QEMU is a piece of software that aims to provide users with a tool where they can emulate different systems, as well as some system peripherals. QEMU uses an intermediate representation (IR) to represent these operations, and through binary translation, it will transform the instructions of the given system or binary into the IR and compile those instructions into the current architecture-supported instructions (just-in-time mode, faster), or it will interpret those IR instructions on its own interpreter (interpreter mode, slower).
To use QEMU, we have two options. The first and simplest one is to use a package manager. The command that’s used will depend on the system that we are using. If we look at the QEMU web page, we will see that they provide different sets of commands, depending on the system:
- Arch:
pacman -
S qemu
- Debian/Ubuntu:
apt-get
install qemu
- RHEL/CentOS:
yum
install qemu-kvm
- SUSE:
zypper
install qemu
In our case, we will make use of an Ubuntu system, so we will use the commands for Debian/Ubuntu. Therefore, the command will be super user: sudo apt-get install qemu or sudo apt
install qemu
.
The other option is to download the QEMU source. This can be downloaded from its download web page or directly from git. In both cases, we will compile and install the tool. Sometimes, this option can be a better fit for us if we want to decide what to install or not during the installation phase.
If we decide to download from its web page (to download the last version, 6.2), we can use the following code:
wget https://download.qemu.org/qemu-6.2.0.tar.xz tar xvJf qemu-6.2.0.tar.xz cd qemu-6.2.0 ./configure make make install
Alternatively, if we want to download using git (this will download the last version in the master), we can do the following:
git clone https://gitlab.com/qemu-project/qemu.git cd qemu git submodule init git submodule update --recursive ./configure make make install
AFL/AFL++
American Fuzzy Lop (AFL) (https://lcamtuf.coredump.cx/afl/) has become the de facto standard for program fuzzing and vulnerability research. Michal Zalewski (https://lcamtuf.coredump.cx/silence/), a famous Google security engineer, developed AFL for internal purposes at Google, which, as a company, owns trillions of lines of code and among them, potentially thousands of vulnerabilities. The approach of AFL follows a genetic algorithm that makes the initial program input evolve and makes AFL smart. Moreover, it offers a suite for analyzing crash dumps that are generated by the program that is being fuzzed. AFL helped users find thousands of vulnerabilities, even in famous software such as MySQL, Adobe Reader, VLC, and IDA Pro, as well as several browsers.
AFL++ has been presented as an evolution of AFL and includes patches to hook in a full system emulator (QEMU) or to instrument a binary (QEMU user mode). In this book, we will start with AFL++ and apply some patches that come from other projects to show how flexible it is to have a fuzzing suite embedded with an emulator to hunt for vulnerabilities in embedded firmware. The following is an example of how to install AFL. Throughout this book, we will provide all the instructions we will need to install what is needed for every specific exercise:
git clone https://github.com/google/AFL.git cd AFL && make
The Ghidra disassembler
Ghidra is a powerful free alternative to IDA Pro. This software was previously owned by the NSA and it was released publicly in 2019. It’s extremely portable since its UI and most of the disassembler internals are written in Java, and it is not dependent on any specific architecture. However, the internal components are compiled natively for the different architectures. This marks a huge difference from other disassemblers because the Java UI makes Ghidra very versatile. Also, Ghidra includes a free decompiler for various architectures, which will be useful when analyzing difficult code.
Installing Ghidra
First of all, as stated previously, Ghidra is written in Java, so we will need to install the Java 11 SDK.
For Linux, follow these steps:
- Download the JDK:
wget https://corretto.aws/downloads/latest/amazon-corretto-11-x64-linux-jdk.tar.gz
- Extract the JDK distribution (the
.tar.gz
file) to your desired location, and add the JDK’s bin directory to yourPATH:
directory. - Extract the JDK:
tar xvf <JDK distribution .tar.gz>
- Open
~/.bashrc
with an editor of your choice; for example, see the following:vi ~/.bashrc
- At the very end of the file, add the JDK bin directory to the
PATH
variable:export PATH=<path of extracted JDK dir>/bin:$PATH
- Save the file.
- Restart any open Terminal windows for changes to take effect.
Once the JDK is installed, we will download Ghidra from https://ghidra-sre.org/ and download the ghidra_10.1.2_PUBLIC_20220125.zip
file or a more recent version if there is one. Unzip the archive and execute ghidraRun
to start the application. Ghidra keeps consistent on its commands, so newer versions will fit what we see in this book. If you are hungry for knowledge about this tool, we recommend reading Ghidra Software Reverse Engineering for Beginners from Packt (https://www.packtpub.com/product/ghidra-software-reverse-engineering-for-beginners/9781800207974). We will also install GNU Debugger, gdb
, with some plugins and for different architectures. This tool can help you analyze executables while they’re running. Normally, Ghidra is mostly used for static analysis.
GDB Multiarch and GEF/Pwndbg
GDB is the default debugger on Linux systems. It is a command-line debugger, and we can use it to debug binaries from architectures different from our current one. To do this, we need to install the multiarch version. We will also install a couple of plugins that improve the view of the tool since gdb
without plugins can be tough at the beginning. The scripts will show the views from the stack, the registers, and the assembly code at every moment. Throughout this book, we will learn how to use gdb
for debugging purposes. The installation commands for the different environments are as follows:
- Arch:
pacman -
S gdb-multiarch
- Debian/Ubuntu:
apt-get
install gdb-multiarch
- SUSE:
zypper
install gdb-multiarch
Then, download or clone https://github.com/apogiatzis/gdb-peda-pwndbg-gef and, from its main directory, execute install.sh
.
Avatar2
The Eurecom institute in South France often hosts very talented students and researchers. This is where Avatar2 was designed by Marius Muench, Dario Nisi, Aurelienne Francillon, and Davide Balzarotti. It’s a Python framework that helps orchestrate embedded systems with the help of QEMU. It contains code to patch memory, emulate peripherals, and mock interfaces to bring firmware to a specific state. Some recent Samsung baseband vulnerabilities (disclosed in September 2020) were discovered thanks to Avatar2, AFL, and QEMU. These vulnerabilities were extremely critical and led to remote code execution within the connection processor (CP) of Samsung phones.