Debugging kernel modules with KGDB
To debug a kernel module under KGDB, pretty much everything remains the same as with debugging in-tree kernel code with GDB. The main difference is this: GDB can't automatically see where the target kernel module's ELF code and data sections are in (virtual) memory, as modules can be loaded and unloaded on demand – we need to tell it. Let's get to how exactly we do so.
Informing the GDB client about the target module's locations in memory
The kernel makes the ELF section information of every kernel module available under sysfs here: /sys/module/<module-name>/sections/.*
. Do an ls -a
on this directory to see the so-called hidden files as well. For example, and assuming that the usbhid
kernel module is loaded up (you can run lsmod
to check, of course), we can see its sections (output truncated) with the following:
ls -a /sys/module/usbhid/sections/ ./ [...] .rodata .symtab [...] .bss .init.text [...] .text [...] .data [...] .text.exit [...] .exit.text [...]
Looking at the content of the files (as root, of course) beginning with a period (.
), you'll see the (kernel virtual) address where that section of the module is loaded into (kernel virtual) memory. For example, a few of the sections of the usbhid
module follow (this is on my x86_64 Ubuntu 20.04 guest – I've reformatted the output a bit for readability):
cd /sys/module/usbhid/sections cat .text .rodata .data .bss 0xffffffffc033b000 0xffffffffc0348060 0xffffffffc034e000 0xffffffffc0354f00
Now, we can feed this information to GDB via its add-symbol-file
command! Specify the module's text section address first (the content of the .text
pseudofile), followed by each individual section in the format -s <section-name> <address>
. For example, with respect to the usbhid
module example, we do this:
(gdb) add-symbol-file </path/to/>usbhid.ko 0xffffffffc033b000 \ -s .rodata 0xffffffffc0348060 \ -s .data 0xffffffffc034e000 \ [...]
To more or less automate this (it's a bit tedious to type it all in manually, right?), I make use of a cool script (slightly modified) from the venerable LDD3 book! Our copy's here: ch11/gdbline.sh
. It works essentially by looping over most of the .
files in /sys/module/<module>/section
, printing out a GDB command string that we can simply copy-paste into GDB!
add-symbol-file <module-name> <text-addr> \ -s <section> <section-addr> \ -s <section> <section-addr> \ [...]
Do check it out (we'll cover using it with an example soon enough – read on!).
Step by step – debugging a buggy module with KGDB
As a demo, let's debug via KGDB a slightly modified – and very simple – version of our earlier ch7/oops_tryv2
module. We call it ch11/kgdb_try
. It uses a delayed workqueue (a workqueue whose worker thread begins execution only after a specified delay has elapsed). In the work function, we (very deliberately – very contrived) cause a kernel panic by performing an out-of-bounds write overflow to a stack memory buffer. Here are the relevant code paths. First, the init function, where the delayed workqueue is initialized and scheduled to run:
// ch11/kgdb_try/kgdb_try.c
static int __init kgdb_try_init(void)
{
pr_info("Generating Oops via kernel bug in a delayed workqueue function\n");
INIT_DELAYED_WORK(&my_work, do_the_work);
schedule_delayed_work(&my_work, msecs_to_jiffies(2500));
return 0; /* success */
}
Why do we use a delayed workqueue, with, as you can see, the delay set to 2.5 seconds? This is done just so you have sufficient time to add the module's symbols to GDB before the kernel Oops'es (you'll soon see us doing this)! The actual – and very contrived – bug is here, within the worker routine:
static void do_the_work(struct work_struct *work)
{
u8 buf[10];
int i;
pr_info("In our workq function\n");
for (i=0; i <=10; i++)
buf[i] = (u8)i;
print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, buf, 10);
[...]
The bug – the local buffer overflow that will occur when i
reaches the value 10
(as, of course, the array has 10 elements only, 0
through 9
, and we're attempting to access the non-existent eleventh element at buf[10]
!) – though seemingly trivial, caused my entire target system to simply freeze when run without KGDB! This is because, internally, the kernel panicked! Try it out and you'll see... Of course, recollect that kernel memory checkers – remember KASAN! – will certainly catch bugs like this.
This time, to try something a little different from last time (debugging the kernel at early boot), we'll use an x86_64 QEMU guest system as the target kernel (instead of the ARM one we used previously). To do so, we'll set up a vanilla 5.10.109 kernel for KGDB, of course (as covered in the Configuring the kernel for KGDB section), and reuse (open source) code from here to set up the root filesystem (it's Debian Stretch): [Linux Kernel Exploitation 0x0] Debugging the Kernel with QEMU, K Makan, Nov 2020 (http://blog.k3170makan.com/2020/11/linux-kernel-exploitation-0x0-debugging.html). This blog article itself generates the rootfs using the Google syzkaller project! Do read through the article for details.
Here are the detailed steps to be carried out – read along and try it out for yourself.
Step 1 – preparing the target system's kernel, root filesystem, and test module on the host
This step involves a bit of work:
- Configuring and building a (debug, KGDB-enabled) kernel for the target system (QEMU emulated x86_64)
- Having a working root filesystem image for the target (so that we can store our module(s), log in, and so on)
- Building the test module against the target kernel
Let's proceed!
Step 1.1 – configuring and building the target kernel
We'll keep it brief:
- Download and extract the kernel source tree for an appropriate kernel. Let's use the 5.10.109 kernel (as it's within the 5.10 LTS series and matches the one we used for the ARM target). Keep the source tree in any convenient location on your system and note it (for the purposes of this demo, let's say you've installed the kernel source tree here:
~/linux-5.10.109
). - Configure the kernel in the usual manner (via the
make menuconfig
UI), taking into account the fact that you must enable support for KGDB and related items – we've covered this in detail in the Configuring the kernel for KGDB section. For your reference, I've kept my kernel config file here:ch11/kconfig_x86-64_target
.Tip
With recent 5.10 (or newer) kernels, the build could fail with an error such as this:
make[1]: *** No rule to make target 'debian/canonical-revoked-certs.pem' , needed by certs/x509_revocation_list'
A quick fix is to do this:
scripts/config --disable SYSTEM_REVOCATION_KEYS
scripts/config --disable SYSTEM_TRUSTED_KEYS
Then, retry the kernel build.
- Build the kernel via
make -j[n] all
. The compressed kernel image (arch/x86/boot/bzImage
) as well as the uncompressed kernel image with symbols (vmlinux
) is generated. As this is all we require for this demo, we skip the (typical) remaining steps of modules and kernel/bootloader installation.
- Build the kernel via
Here's my custom KGDB-enabled kernel images:
$ ls -lh arch/x86/boot/bzImage vmlinux -rw-rw-r-- 1 osboxes osboxes 7.9M May 3 13:29 arch/x86/boot/bzImage -rwxrwxr-x 1 osboxes osboxes 240M May 3 13:29 vmlinux*
Let's move along...
Step 1.2 – obtaining a working root filesystem image for the target
We'll of course require a target root filesystem (or rootfs). Further, it will require having our test kernel module (compiled with the same target kernel) plus the gdbline.sh
and doit
wrapper scripts on it (we explain the purpose of the last one shortly). Now, building a rootfs from scratch isn't a trivial task, thus, to ease the effort, we provide a fully functional root filesystem image based on the Debian Stretch distro.
We covered downloading the compressed rootfs image file in the Technical requirements section (if you haven't yet done so, please ensure you download it now). Now extract it:
7z x rootfs_deb.img.7z
It will get extracted into a directory named images/
. You now have the uncompressed and ready-to-use target rootfs binary image (of size 512 MB) here: ch11/images/rootfs_deb.img
.
FYI, you can always edit the rootfs image by, on the host, loop mounting it (when it's not in use!), editing its content, then unmounting it (see Figure 11.7). Here, you don't need to do this yourself; it's been done and the target rootfs has been supplied to you.
We've kept all required files for the module debug demo on the target rootfs under the /myprj
directory. As a quick sanity check, let's loop mount the target root filesystem image file and peek into it (ensure you create the mount point directory first, /mnt/tmp
, here):
Don't forget: only loop mount and edit the target rootfs when it's not in use via QEMU (or another hypervisor). Unmount it when done!
On our host system, here's what the directory tree structure under ch11/
should now look like:
Right, let's continue.
Step 1.3 – building the module for the target kernel
One more step's required here: the test module (under ch11/kgdb_try
) needs to be built and deployed on both the target and host systems. (Actually, it's already deployed on the target rootfs; we need to build it on our host.) So, cd
to the ch11/kgdb_try
directory and issue the make
command to build it.
Importantly, the Makefile
must take into account the fact that this module's built against the target 5.10.109 kernel (and not the native one)! So, we've changed the KDIR
variable within the Makefile
to reflect this location:
// ch11/kgdb_try/Makefile
#@@@@@@@@@@@@ NOTE! SPECIAL CASE @@@@@@@@@@@@@@@@@
# We specify the build dir as the linux-5.10.109 kernel src tree; this is as
# we're using this as the target x86_64 kernel and debugging this module over KGDB
KDIR ?= ~/linux-5.10.109
If the kernel's in a different location on your system, update the Makefile
's KDIR
variable first and then build the module.
Note
If you make any changes in the kgdb_try.c
source and rebuild, you'll need to update the module within the target rootfs as well, by loop mounting the rootfs image file, copying the new kgdb_try.ko
module into its /myprj
directory, and doing the unmount.
Good job! Let's move on to the next step...
Step 2 – target startup and wait at early boot
Start the x86_64 target (via QEMU). We expect you've installed qemu-system-x86_64
by now (as advised in the Technical requirements section):
cd <book_src>/ch11 qemu-system-x86_64 \ -kernel ~/linux-5.10.109/arch/x86/boot/bzImage \ -append "console=ttyS0 root=/dev/sda earlyprintk=serial rootfstype=ext4 rootwait nokaslr" \ -hda images/rootfs_deb.img \ -nographic -m 1G -smp 2 \ -S -s
For your convenience, the same command's available within a wrapper script here: ch11/run_target.sh
. Simply run it, passing the kernel and rootfs image files as parameters.
Tip
Running QEMU with the -enable-kvm
option switch can make guest execution (much!) faster. This requires hardware-level virtualization support of course (implying that CPU virtualization is enabled at the firmware/BIOS level). On the x86, you can check with egrep "^flags.*(vmx|svm)" /proc/cpuinfo
. If there's no output, it isn't enabled and won't work. Also, this could fail if any other hypervisor is running and making use of KVM (your Ubuntu guest on VirtualBox perhaps); in effect, if nested virtualization isn't supported by KVM.
Right, the guest kernel starts and pretty much immediately waits, due to the effect of QEMU's -S
option switch (see Figure 11.9).
Step 3 – host system remote GDB startup
On the host (which in our case is the Ubuntu x86_64 guest), let's set up the GDB client to debug the target system. So, cd
into the target kernel source tree (here, we're taking it as being in ~/linux-5.10.109
). Run GDB, passing along the uncompressed 5.10.109 kernel image (vmlinux
) as a parameter (see Figure 11.10), enabling GDB to read in all symbols. In addition, we employ the GDB initialization/startup file ~/.gdbinit
to define a simple macro (we cover GDB macros in the GDB custom macros in its startup file section). Here's the connect_qemu
macro definition:
cat ~/.gdbinit [...] set auto-load safe-path / define connect_qemu target remote :1234 hbreak start_kernel hbreak panic #hbreak do_init_module end
On startup, GDB will parse in its content, thus allowing us to run our custom macro connect_qemu
, allowing us to connect to the target and set up a couple of hardware breakpoints (via GDB's hbreak
command). Here are a few points regarding the GDB startup file content:
- The
set auto-load safe-path /
directive is to allow GDB to parse in and use various Python-based GDB helper scripts. We cover the details in the Setting up and using GDB scripts with CONFIG_GDB_SCRIPTS section. - A tip, useful at times: adding the kernel function
do_fsync()
as a breakpoint is a convenience, allowing you to break into GDB by typingsync
on the target command line. - We add the
start_kernel()
hardware breakpoint here simply as a demo, for no other reason... It's pretty much the first C function hit as the kernel boots up! - We have a commented-out hardware breakpoint on the function
do_init_module()
. This can be very helpful, allowing you to debug any module's init code path straight away (details follow in the Debugging a module's init function section).Tip
Ensure you use hardware breakpoints (via GDB's
hbreak
command) for your key breakpoints, and not software watchpoints! Theinfo breakpoints
command (abbreviated as simplyi b
) will reveal all currently defined breakpoints and watchpoints.
A couple of screenshots will help clarify things. First, the state of the target kernel just after boot:
Here's a screenshot of running the GDB client on the host (from the kernel source tree location) and issuing our connect_qemu
macro:
Fantastic – let's continue...
Step 4 – target system: install the module and add symbols to GDB
When debugging with KGDB, you'll need to insmod
the (possibly buggy) module and add its symbols (as explained in the Informing the GDB client about the target module's locations in memory section). But – in this demo at least! – you need to do all this quickly, before it actually crashes! So, on the target rootfs, we have a simple wrapper script (/myprj/doit
) to do the following:
- Set the (target) kernel to panic on Oops.
insmod
the module on the target system (the one running with the GDB server component, that is, with KGDB enabled, of course).- Execute our
gdbline.sh
script. It generates the keyadd-symbol-file
GDB command! Quickly now... - We – quickly, before the kernel Oops'es and panics! – switch to the host system GDB and press
^C
, interrupting (and stopping) the target kernel. (Whew, now we're safe.) We then copy-paste the GDBadd-symbol-file
command that was generated on the target, informing GDB about the module's symbols. - Add a hardware breakpoint for the routine of interest. Here, we run
hbreak
on our workqueue functiondo_the_work()
.
Here's the code of the target rootfs /myprj/doit
script (which is itself already embedded within the target rootfs image):
echo 1 > /proc/sys/kernel/panic_on_oops sudo insmod ./kgdb_try.ko sudo ./gdbline.sh kgdb_try ./kgdb_try.ko
So, let's get going. First, have the target continue (type c
) to boot up, log in to it (as required), and run this helper script to set things up. Of course, the target first hits the start_kernel()
hardware breakpoint. Great – you can look around, then type c
to have GDB continue the target. It boots up fully... (it can take a moment – be patient). The target kernel now asks you to log in. Here, simply pressing the Enter key is sufficient as we simply enter the Debian maintenance mode and work there – it's fine to do so:
Now, a key part of this exercise: on the target root filesystem, cd
to the /myprj
directory and run our wrapper doit
script. It runs, generating the output – the add-symbol-file
command we must issue within GDB! You'll realize, of course, that the (buggy) kgdb_try.ko
module is right now executing its code paths. As we're using a delayed workqueue, we've bought some time (2.5 s here) before the buggy do_the_work()
code runs.
Quickly now! Switch to the host window where our client GDB process is running and press ^C
(Ctrl + C). This has GDB break in - the target's execution is stopped, it's now frozen (whew!). This is important, as otherwise, the bug can trigger before we set up the breakpoint on our buggy module. In Figure 11.12, you can see our typing of ^C
in the right-side host window. The following screenshot reveals the action:
Great job! Now do the following:
- From the target window (the left-side one in Figure 11.12), copy the output of our
gdbline.sh
script – the GDBadd-symbol-file
command and whatever follows (in effect, the content between the---snip---
delimiters) – into the clipboard. - Switch back to the host window running the client GDB (the right-side one in Figure 11.12).
- Important!
cd
to the directory where the kernel module's code is (GDB needs to be able to see it). - Paste the clipboard content – the complete
add-symbol-file <...>
command – into GDB. It prompts whether to accept this. Answer yes (y
). GDB reads in the module symbols! See this in the (truncated) screenshot:
Super! Now that GDB understands the module memory layout and has its symbols, simply add (hardware) breakpoints as required! Here, we just add the relevant one, the workqueue function:
(gdb) hbreak do_the_work Hardware assisted breakpoint 3 at 0xffffffffc004a000: file /home/osboxes/Linux-Kernel-Debugging/ch11/kgdb_try/kgdb_try.c, line 43. (gdb)
By the way, you'll recall we earlier enabled the kernel config GDB_SCRIPTS
. This has several useful Python-based helper scripts become available during a GDB session kernel debug session (we cover this topic in more detail in the Setting up and using GDB scripts with CONFIG_GDB_SCRIPTS section). As an example, we issue the lx-lsmod
helper to show all modules currently loaded (on the target kernel's memory):
(gdb) lx-lsmod Address Module Size Used by 0xffffffffc004a000 kgdb_try 20480 0 (gdb)
Cool – its output is as expected. Notice how the kernel virtual address of where the module is loaded in memory (0xffffffffc004a000
here) perfectly matches the first parameter to the add-symbol-file
command – it's the address of the module's .text
(code) section!
Step 5 – debugging the module with [K]GDB
So, finally: we're all set up. We can now go ahead and debug the target module in the usual manner, setting breakpoints, examining data, and stepping through its code!
Within the host (client) GDB process, type c
to continue. The target system resumes execution... Soon enough, the delay that we specified (2.5 s) before the workqueue function – do_the_work()
– must run will elapse. The function will begin to execute, and immediately get trapped into via GDB (don't forget, we set up a hardware breakpoint on it in the previous step!):
Looking at Figure 11.14, we examine the (kernel) stack with the bt
(backtrace
) GDB command – it's as expected. Next, let's do something interesting: we know the bug's in the loop when the local variable i
reaches the value 10
(needless to say, in the C array, indices begin at 0
, not 1
). Now instead of single-stepping through the loop 10 times, we can set up a conditional breakpoint, telling GDB to stop execution when the value of i
is, say, 8
. This is easily achieved with the GDB command:
(gdb) b 49 if i==8
FYI, we cover more on this in the Conditional breakpoints section. So, let's proceed:
We have GDB continue. The conditional breakpoint is hit... It works: the value of i
is 8
(to begin with). Notice how I used the display i
GDB command to have GDB always display the value of the variable i
(after every step (s)
or next (n)
GDB command). Look at Figure 11.15 carefully: we find that, though the bug's hit (when i
reaches the value 10
), execution seems to continue. Yes, for just a short while. The kernel's built-in stack overflow detection code paths do kick in soon enough – and guess what: the kernel panics quite spectacularly! The parameter to panic()
is a string – the reason for the panic. Clearly, it's due to kernel stack corruption! The following figure shows all this clearly:
When we have GDB continue the target's execution (by typing c
); the panic message details are now seen in the target system console window:
Great going!
There's a nagging issue though: how do you debug a module's early init code in KGDB? That's what we cover next!
Debugging a module's init function
Here, for the purpose of a simple-yet-interesting demo, we used a delayed workqueue. Once the delay elapsed (2.5 s here), the buggy workqueue function executed and resulted in an Oops (and a subsequent panic). We could debug it with KGDB! But, think about this: in a project, what if the module's init function doesn't use a delayed workqueue, just a regular workqueue? Then, the workqueue function will run almost immediately, before you have time to set up a breakpoint on it! How do we debug such situations?
The key, is to be able to debug the early module initialization code itself, allowing you to then single-step through it. This can be achieved by setting up a breakpoint on the module's init function itself. Hang on – this may not work out. Think: the setting of the breakpoint has to happen after the insmod
command is issued but by the time you type hbreak kgdb_try_init
(or whatever), the bug could trigger!
So, here's a workable solution: set a (hardware) breakpoint on the kernel infrastructure code that performs the actual work when invoking a module's init function – the do_init_module(struct module *mod)
function. This can be done at any time, even as part of the connect_qemu
(or equivalent) macro that we set up! Then, once the breakpoint's hit, proceed debugging the code from when the breakpoint's hit. You can even check which module's being loaded by looking up the pointer to the module structure (it's the single parameter passed to the do_init_module()
function), then running the set print pretty
gdb command, followed by p *mod
. Then, look for the structure member named name
. What a surprise – it reveals the name of the module! I also found the module name this way:
(gdb) x/s mod.mkobj.kobj.name 0xffff8880036eb770: "kgdb_try"
Neat.
Exercise
Debug a kernel module's init function using the method just described.
On a project, this whole process – debugging a module via KGDB – can be very powerful: you saw how we can single-step through the module's code, employ the backtrace
command to see the (kernel) stack in detail, and examine memory (with GDB's x
command) and variable values (and even change variables with GDB's set variable
command!). In a later section (Using GDB's TUI mode), we even show how you can single-step through assembly code! All this can result in getting valuable insight into the code's behavior, ultimately (fingers crossed!) helping you find the root cause of that annoying bug.
Awesome going – let's wind up this chapter with a few useful tips and tricks.