Database system architecture overview
Database systems (DBSs) are made up of databases and data management systems (DBMSs). A database comprises the physical files holding a collection of data. It consists of data files, log files, parameter files, and so on. A database is commonly controlled by a DBMS. A database instance is a set of background processes manipulating the data and memory structures used for data processing, evaluation, and retrieval. Background processes are primarily responsible for asynchronous I/O data operations but also manage memory structures and general maintenance tasks. There are many background process categories. The most relevant for this book are the following:
- Database Writer (DBWn) is responsible for writing a modified data block from the buffer cache memory structure to the particular blocks of the files that constitute the database.
- Log Writer (LGWR) is responsible for the transaction logging by capturing data and storing it in the online redo log file.
- Process Monitor (PMON) performs recovery if a user process fails by cleaning the caches and deallocating the assigned session resources.
- System Monitor (SMON) performs instance recovery in case of its failure.
- Archiver (ARCn) is responsible for copying the online redo log file if a log switch occurs. Log files are consecutively rewritten. However, by copying them, it is possible to reconstruct the database into a historical image by identifying any changes made since a certain point in the archived logs.
- Checkpoint (CKPT) is responsible for the checkpoint event management by taking all modified buffers and writing their content to data files. Control files are also modified.
- Manageability Monitor (MMON) performs maintenance tasks, such as issuing alerts, taking snapshots, and capturing statistics.
- Job Queue Processes (CJQ0 and Jnnn) are responsible for launching and executing the jobs (scheduled user-defined tasks that are executed once or planned to be executed periodically).
The general architecture of the database server is shown in Figure 1.17. To connect the client to the server, first, the user process is invoked on the client site, contacting the database server listener. By cooperating with the other background processes (mostly PMON), a new server process is created. It takes the small memory structure called the PGA that is private to the server process. It stores the current state, cursor variables, local variables, parameters, and so on. The server process is the manager of the user process on the server side, serving user requests for processing. After creating a server process, a client can communicate with the server process directly, without the listener.
Figure 1.17 – Oracle database server architecture
Each instance is formed by the background processes already discussed, along with memory structures, which we will learn about in the following section.
Memory structures
You are now adequately familiar with the background processes. However, what about the memory structures they operate? Dozens of structures are part of the memory that ensures performance and maintenance activities. In this section, we’ll mainly look at memory structures applicable to the database data. These structures are shared among the sessions forming the SGA. Let’s look at them in some detail:
- The SGA is allocated when an instance is started up and released on shutdown. It consists of various memory structures. Most of them are dynamic in size. The available structures can be divided into two types – required structures (such as buffer caches, log buffers, and shared pools) or optional structures (such as large pools, Java pools, or Streams pools).
- The Database buffer cache is a work area for executing SQL as an interlayer between databases. All data changes are done in the memory and then saved to the database. The buffer cache is a block-oriented matrix. The block itself can be clean (no changes have been made to the data held there), dirty (data with changes), or empty (not yet used).
- The Log buffer is a small, short-term staging area holding data change vectors (modifications applied to data) before they are written to the redo log file on the disk. Redo logging ensures that no data can be lost. Due to performance issues, redo data is not directly written to the redo log file, but a near-real-time approach is used, processing in batches. There are many triggers for copying redo logs from the memory to the storage:
- Every 3 seconds
- By reaching a commit or checkpoint
- If it is 1/3 full
The log writer background process operates the log buffer.
- The most complex structure is a Shared pool, consisting of dozens of substructures. It is managed internally. The most important structures are the Library cache (storing recently executed code in parsed form), the Data dictionary cache (holding metadata such as object definitions: tables descriptions, indexes, users, and so on), the PL/SQL area (holding cached data to prevent repeated reading), and the Result cache (storing result sets temporarily for SQL and PL/SQL).
- The optional Large pool extends the shared pool and covers large objects such as backups. The Java pool is a heap space for Java-stored procedures to be run by the database system. The Streams pool is used by Oracle Streams architecture. The principle is to extract change vectors from the redo log to reconstruct statements to be executed, which requires additional memory.
So, now we know the core elements that make up the database systems, as well as the sub-components of each element. However, what about the interconnection between the instance and the database? Is one instance devoted to only one database? Well, in the following sections, we will highlight individual strategies, focusing on database containerization in the cloud environment. We will list all the key features and principles of database management related to system architecture and approaches.
Database system architecture
Each database system is formed by the instance characterized by the background processes and memory structures and the database itself. In general, various mapping strategies are available to operate the database, representing the ratio between them. Namely, one database can be accessed from multiple instances. However, one instance is dedicated only to one database (container) at a time.
Single-tenant databases
A single-tenant database (also known as a non-Container Database (CDB)) consists of a set of data files, control files, transaction logs (redo log files), parameter files, and metadata files. For the database, one instance is created. Before version 12c, this architecture was the only one used. Instance memory was solidly interconnected with the database (commonly named ORCL by default). The database listener was responsible for connection creation at the session level. Data was interchanged between the processes on the client and server part directly without listener interaction. Oracle has now deprecated this architecture. However, despite being unsupported in version 21c, it is still widely used.
Figure 1.18 – Single-tenant database architecture
The preceding diagram shows the single-tenant database architecture. The mapping between the instance and database is 1:1. The extension of this architecture just involves a single-tenant RAC environment consisting of one database, operated by multiple instances.
Single-tenant RAC databases
Single-tenant (non-container) database can be run on multiple computers (nodes, servers, and hosts) by managing multiple instances operating (mounting, opening, and managing) one database. The main advantages are high performance, availability, fault resistance, and scalability. Thus, new instances with additional memory and CPUs can be added dynamically to serve the workload. The client is navigated to a Single-Client Access Name (SCAN) RAC listener. The listener connects the client to the most suitable instance based on balancing the current workload.
Figure 1.19 – Single-tenant RAC database architecture
The preceding diagram illustrates the RAC architecture of the single-tenant database, meaning that one database is operated by multiple instances. The workload is balanced by SCAN listeners navigating the client to a particular instance listener. By generalizing this architecture, containerization is created. It allows the databases to be attached and detached dynamically from the root container.
Multi-tenant CDBs
Multi-tenant CDBs were introduced in Oracle Database 12c. Also known as root CDBs, these contain a limited set of data files, control files, redo log files, parameter files, and metadata. However, there are no application objects or code in the data files. This kind of database is self-contained and can be mounted and opened without any other physical structures.
A Pluggable Database (PDB) is only made up of data files. They contain application data, objects, and the code itself. No metadata is present, so each PDB needs a container to be plugged into it. This type of database inherits its data repository from the container (such as redo log files, control, files, and parameter files).
The accessibility and mapping must apply certain rules. The root container can manage multiple PDBs simultaneously, but each database is associated with just one container at a time. Thus, one instance is associated just with one (root) container; however, it can reference multiple PDBs. One PDB is part of one container at a time.
Multi-tenant RAC databases
Multi-tenant RAC databases provide a general solution consisting of SCAN listeners.
Each instance has a separate local listener; however, the overall workload is balanced using the SCAN listeners. This architecture is shown in Figure 1.20. The user process is routed to the SCAN listener layer, followed by the transition to the specific instance. The database layer is enclosed by the container with dynamic database attachment and detachment functionalities.
Figure 1.20 – A multi-tenant CDB
The preceding figure illustrates this architecture. The architecture and user management are similar to Figure 1.18, with the difference being that the database view is composed of configuration and parameter files and PDBs that can be managed dynamically. It provides a robust solution by only managing active databases.
Finally, a sharded database architecture will be discussed next, which divides the structure of the database into several fragments.
Sharded databases
Oracle Database version 12.2 introduced sharded databases – a logical database horizontally partitioned across a pool of physical databases. Each database (shard) has its own dedicated server and instance. However, from the user’s point of view, it looks like one single database. Data is distributed across shards using the sharding key of the table. Following the system architecture, there is a connection pool to a sharded database, operated by the shard directors (listeners routing the processing based on the sharding key) and shard catalog. It provides robust linear scalability, fault tolerance (as shards exist on separate independent hardware and do not influence other shards), and data independence (updates can be applied to each shard independently). Moreover, it allows distribution across any number of regions. Each shard can be also be configured with different hardware or storage systems, ensuring compliance with laws and regulations related to the specific data positioning restrictions.
Figure 1.21 – Sharded database
The practical usage of sharding is associated with the elimination of manual data preparation as a necessity, leading to time savings by emphasizing scalability, high availability, response time, and write bandwidth. Sharded tables are distributed and duplicated across servers, which are then associated with the particular application and usage by limiting the amount of data covered by the server. Moreover, each shard can be optimized for specific usage and applications via the parameters, index set, partitioning, and other optimization techniques. A significant advantage is achieved by this parallelism and elasticity – data can be resharded without any downtime.