Understanding system calls
While looking at the figure explaining the interaction between applications and the Virtual Filesystem, you may have noticed the intermediary layer between user space programs and the Virtual Filesystem; that layer is known as the system call interface. To request some service from the kernel, user space programs invoke the system call interface. These system calls provide the means for end user applications to access the resources in the kernel space, such as the processor, memory, and storage. The system call interface serves three main purposes:
- Ensuring security: System calls prevent user space applications from directly modifying resources in the kernel space
- Abstraction: Applications do not need to concern themselves with the underlying hardware specifications
- Portability: User programs can be run correctly on all kernels that implement the same set of interfaces
There’s often some confusion about the differences between system calls and an application programming interface (API). An API is a set of programming interfaces used by a program. These interfaces define a method of communication between two components. An API is implemented in user space and outlines how to acquire a particular service. A system call is a much lower-level mechanism that uses interrupts to make an explicit request to the kernel. The system call interface is provided by the standard C library in Linux.
If the system call generated by the calling process succeeds, a file descriptor is returned. A file descriptor is an integer number that is used to access files. For example, when a file is opened using the open ()
system call, a file descriptor is returned to the calling process. Once a file has been opened, programs use the file descriptor to perform operations on the file. All read, write, and other operations are performed using the file descriptor.
Every process always has a minimum of three files opened – standard input, standard output, and standard error – represented by the 0, 1, and 2 file descriptors, respectively. The next file opened will be assigned the file descriptor value of 3. If we do some file listing through ls
and run a simple strace
, the open system call will return a value of 3, which is the file descriptor representing the file – /etc/hosts
, in this case. After that, this file descriptor value of 3 is used by the fstat
and close
calls to perform further operations:
strace ls /etc/hosts root@linuxbox:~# strace ls /etc/hosts execve("/bin/ls", ["ls", "/etc/hosts"], 0x7ffdee289b48 /* 22 vars */) = 0 brk(NULL) = 0x562b97fc6000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=140454, ...}) = 0 mmap(NULL, 140454, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbaa2519000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
[The rest of the code is skipped for brevity.]
On x86 systems, there are around 330 system calls. This number could be different for other architectures. Each system call is represented by a unique integer number. You can list the available system calls on your system using the ausyscall
command. This will list the system calls and their corresponding integer values:
ausyscall –dump root@linuxbox:~# ausyscall --dump Using x86_64 syscall table: 0 read 1 write 2 open 3 close 4 stat 5 fstat 6 lstat 7 poll 8 lseek 9 mmap 10 mprotect
[The rest of the code is skipped for brevity.]
root@linuxbox:~# ausyscall --dump|wc -l 334 root@linuxbox:~#
The following table lists some common system calls:
System call |
Description |
|
Open and close files |
|
Create a file |
|
Change the |
|
Mount and unmount filesystems |
|
Change the pointer position in a file |
|
Read and write in a file |
|
Get a file status |
|
Get filesystem statistics |
|
Execute the program referred to by pathname |
|
Checks whether the calling process can access the file pathname |
|
Creates a new mapping in the virtual address space of the calling process |
Table 1.1 – Some common system calls
So, what role do the system calls play in interacting with filesystems? As we’ll see in the succeeding section, when a user space process generates a system call to access resources in the kernel space, the first component it interacts with is the Virtual Filesystem. This system call is first handled by the corresponding system call handler in the kernel, and after validating the operation requested, the handler makes a call to the appropriate function in the VFS layer. The VFS layer passes the request on to the appropriate filesystem driver module, which performs the actual operations on the file.
We need to understand the why here – why would the process interact with the Virtual Filesystem and not the actual filesystem on the disk? In the upcoming section, we’ll try to figure this out.
To summarize, the system calls interface in Linux implements generic methods that can be used by the applications in user space to access resources in the kernel space.