Breaking down shellcode
Shellcodes can be written in various architectures. The main architectures that you are likely see in your day-to-day working life are x86-64 and ARM. There are big differences between the x86-64 and ARM CPU architectures. For instance, the x86-64 architecture makes use of Complex Instruction Set Computing (CISC) while ARM makes use of Reduced Instruction Set Computing (RISC).
The following table highlights some of the key differences between these two instruction sets:
You will be able to find more in-depth information on the differences between the CISC and RISC architectures on the internet. The aim of this book is not to dive into the complexity of CPU architectures. However, having a good idea of the CPU architecture of your target will ultimately help you to better craft your shellcode.
To write shellcode, you need to have a good understanding of assembly language. Computers cannot run code from assembly language, and the reason for this is that computers understand machine code, also known as machine language. Assembly language provides an interface layer to machine language.
Here is a simple Hello World program in assembly language code, which is specific to Linux operating systems:
section.text global _start ;must be declared for linker (ld)_start: ;tells linker entry point movedx,len ;message length movecx,msg ;message to write movebx,1 ;file descriptor (stdout) moveax,4 ;system call number (sys_write) int0x80 ;call kernel moveax,1 ;system call number (sys_exit) int0x80 ;call kernelsection.datamsg db 'Hello, world!', 0xa ;string to be printedlen equ $ - msg ;length of the string
When the preceding code is compiled and executed, it will display the text defined in the kernelsection.datamsg db 'Hello World!'
line.
Assembly language consists of three main components. These are executable instructions, assembler directives, and macros. Executable instructions provide instructions to the processor, assembler directives define the assembly, and macros provide a text substitution mechanism. In the next chapter, we will cover assembly language in more detail.
Machine language is a very low-level programming language. It is written in binary, in other words, 1s and 0s. Due to it being binary, it is easily understood by computers. The inverse is that it is very difficult to understand by humans. So, imagine trying to read shellcode that is in the form of machine language – it could be nearly impossible, depending on the complexity of the code. The execution of machine language is super-fast, purely since it is in binary format.
A sample of machine language is as follows:
1110 0001 1010 0010 0010 0011 0000 0011
The key takeaway is that in order to make use of machine language, assembly language is needed.
The more common type of programming language you may come across is a high-level programming language. This type of language is more human friendly and readable. Examples of this type of language are C, C++, and Python. At the beginning of this chapter, the first example of shellcode was written in C – that is what a high-level programming language looks like.
As you progress in the book, you will better understand the uses of the various components that make up shellcode. This includes the various tools that can be used to create shellcode, convert code to assembly language, and obtain machine code.