Understanding storage in a modern-day data center
Compute, storage, and networking are the basic building blocks of any infrastructure. How well your applications do is often dependent on the combined performance of these three layers. The workloads running in a modern data center vary from streaming services to machine learning applications. With the meteoric rise and adoption of cloud computing platforms, all the basic building blocks are now abstracted from the end user. Adding more hardware resources to your application, as it becomes resource-hungry, is the new normal. Troubleshooting performance issues is often skipped in favor of migrating applications to better hardware platforms.
Of the three building blocks, compute, storage, and networking, storage is often considered the bottleneck in most scenarios. For applications such as databases, the performance of the underlying storage is of prime importance. In cases where infrastructure hosts mission-critical and time-sensitive applications such as Online Transaction Processing (OLTP), the performance of storage frequently comes under the radar. The smallest of delays in servicing I/O requests can impact the overall response of the application.
The most common metric used to measure storage performance is latency. The response times of storage devices are usually measured in milliseconds. Compare that with your average processor or memory, where such measurements are measured in nanoseconds, and you’ll see how the performance of the storage layer can impact the overall working of your system. This results in a state of incongruity between the application requirements and what the underlying storage can actually deliver. For the last few years, most of the advancements in modern-day storage drives have been geared toward sizing – the capacity arena. However, performance improvement of the storage hardware has not progressed at the same rate. Compared to the compute functions, the performance of storage pales in comparison. For these reasons, it is often termed the three-legged dog of the data center.
Having made a point about the choice of a storage medium, it’s pertinent to note that no matter how powerful it is, the hardware will always have limitations in its functionality. It’s equally important for the application and operating system to tune themselves according to the hardware. Fine-tuning your application, operating system, and filesystem parameters can give a major boost to the overall performance. To utilize the underlying hardware to its full potential, all layers of the I/O hierarchy need to function efficiently.
Interacting with storage in Linux
The Linux kernel makes a clear distinction between the user space and kernel space processes. All the hardware resources, such as CPU, memory, and storage, lie in the kernel space. For any user space application wanting to access the resources in kernel space, it has to generate a system call, as shown in Figure 1.1:
Figure 1.1 – The interaction between user space and kernel space
User space refers to all the applications and processes that live outside of the kernel. The kernel space includes programs such as device drivers, which have unrestricted access to the underlying hardware. The user space can be considered a form of sandboxing to restrict the end user programs from modifying critical kernel functions.
This concept of user and kernel space is deeply rooted in the design of modern processors. A traditional x86 CPU uses the concept of protection domains, called rings, to share and limit access to hardware resources. Processors offer four rings or modes, which are numbered from 0 to 3. Modern-day processors are designed to operate in two of these modes, ring 0 and ring 3. The user space applications are handled in ring 3, which has limited access to kernel resources. The kernel occupies ring 0. This is where the kernel code executes and interacts with the underlying hardware resources.
When processes need to read from or write to a file, they need to interact with the filesystem structures on top of the physical disk. Every filesystem uses different methods to organize data on the physical disk. The request from the process doesn’t directly reach the filesystem or physical disk. In order for the I/O request of the process to be served by the physical disk, it has to traverse through the entire storage hierarchy in the kernel. The first layer in that hierarchy is known as the Virtual Filesystem. The following figure, Figure 1.2, highlights the major components of the Virtual Filesystem:
Figure 1.2 – The Virtual Filesystem (VFS) layer in the kernel
The storage stack in Linux consists of a multitude of cohesive layers, all of which ensure that the access to physical storage media is abstracted through a unified interface. As we move forward, we’re going to build upon this structure and add more layers. We’ll try to dig deep into each of them and see how they all work in harmony.
This chapter will focus solely on the Virtual Filesystem and its various features. In the coming chapters, we’re going to explain and uncover some under-the-hood workings of the more frequently used filesystems in Linux. However, bearing in mind the number of times the word filesystem is going to be used here, I think it’s prudent to briefly categorize the different filesystem types, just to avoid any confusion:
- Block filesystems: Block- or disk-based filesystems are the most common way to store user data. As a regular operating system user, these are the filesystems that users mostly interact with. Filesystems such as Extended filesystem version 2/3/4 (Ext 2/3/4), Extent filesystem (XFS), Btrfs, FAT, and NTFS are all categorized as disk-based or block filesystems. These filesystems speak in terms of blocks. The block size is a property of the filesystem, and it can only be set when creating a filesystem on a device. The block size indicates what size the filesystem will use when reading or writing data. We can refer to it as the logical unit of storage allocation and retrieval for a filesystem. A device that can be accessed in terms of blocks is, therefore, called a block device. Any storage device attached to a computer, whether it is a hard drive or an external USB, can be classified as a block device. Traditionally, block filesystems are mounted on a single host and do not allow sharing between multiple hosts.
- Clustered filesystems: Clustered filesystems are also block filesystems and use block-based access methods to read and write data. The difference is that they allow a single filesystem to be mounted and used simultaneously by multiple hosts. Clustered filesystems are based on the concept of shared storage, meaning that multiple hosts can concurrently access the same block device. Common clustered filesystems used in Linux are Red Hat’s Global File System 2 (GFS2) and Oracle Clustered File System (OCFS).
- Network filesystems (NFS): NFS is a protocol that allows for remote file sharing. Unlike regular block filesystems, NFS is based on the concept of sharing data between multiple hosts. NFS works with the concept of a client and a server. The backend storage is provided by an NFS server. The host systems on which the NFS filesystem is mounted are called clients. The connectivity between the client and server is achieved using conventional Ethernet. All NFS clients share a single copy of the file on the NFS server. NFS doesn’t offer the same performance as block filesystems, but it is still used in enterprise environments, mostly to store long-term backups and share common data.
- Pseudo filesystems: Pseudo filesystems exist in the kernel and generate their content dynamically. They are not used to store data persistently. They do not behave like regular disk-based filesystems such as Ext4 or XFS. The main purpose of a pseudo filesystem is to allow the user space programs to interact with the kernel. Directories such as
/proc (procfs)
and/sys (sysfs)
fall under this category. These directories contain virtual or temporary files, which include information about the different kernel subsystems. These pseudo filesystems are also a part of the Virtual Filesystem landscape, as we’ll see in the Everything is a file section.
Now that we have a basic idea about user space, kernel space, and the different types of filesystems, let’s explain how an application can request resources in kernel space through system calls.