Different database management systems support diverse application scenarios, use cases, and requirements. Database management systems have a long history. First, we will quickly take a look at the recent history, and then explore the market-dominant database management system categories.
Database management systems
A brief history
Broadly, the term database can be used to present a collection of things. Moreover, this term brings to mind many other terms including data, information, data structure, and management. A database can be defined as a collection or repository of data, which has a certain structure, managed by a database management system (DBMS). Data can be structured as tabular data, semi-structured as XML documents, or unstructured data that does not fit a predefined data model.
In the early days, databases were mainly aimed at supporting business applications; this led us to the well-defined relational algebra and relational database systems. With the introduction of object-oriented languages, new paradigms of database management systems appeared such as object-relational databases and object-oriented databases. Also, many businesses as well as scientific applications use arrays, images, and spatial data; thus, new models such as raster, map, and array algebra are supported. Graph databases are used to support graph queries such as the shortest path from one node to another, along with supporting traversal queries easily.
With the advent of web applications such as social portals, it is now necessary to support a huge number of requests in a distributed manner. This has led to another new paradigm of databases called NoSQL (Not Only SQL), which has different requirements such as performance over fault tolerance and horizontal scaling capabilities. In general, the timeline of database evolution was greatly affected by many factors such as the following:
- Functional requirements: The nature of the applications using a DBMS has led to the development of extensions on top of relational databases such as PostGIS (for spatial data) or even dedicated DBMS such as SciDB (for scientific data analytics).
- Nonfunctional requirements: The success of object-oriented programming languages has created new trends such as object-oriented databases. Object relational database management systems have appeared to bridge the gap between relational databases and the object-oriented programming languages. Data explosion and the necessity to handle terabytes of data on commodity hardware have led to columnar databases, which can easily scale up horizontally.
Database categories
Many database models have appeared and vanished such as the network model and hierarchical model. The predominant categories now in the market are relational, object-relational databases, and NoSQL databases. One should not think of NoSQL and SQL databases as rivals--they are complementary to each other. By utilizing different database systems, one can overcome many limitations and get the best of different technologies.
The NoSQL databases
The NoSQL databases are affected by the CAP theorem, also known as Brewer's theorem. In 2002, S. Gilbert and N. Lynch published a formal proof of the CAP theorem in their article, Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. In 2009, the NoSQL movement began. Currently, there are over 150 NoSQL databases (nosql-database.org).
The CAP theorem
The CAP theorem states that it is impossible for a distributed computing system to simultaneously provide all three of the following guarantees:
- Consistency: All clients see (immediately) the latest data even in the case of updates.
- Availability: All clients can find a replica of some data even in the case of a node failure. This means that even if some part of the system goes down, the clients can still access the data.
- Partition tolerance: The system continues to work regardless of arbitrary message loss or failure of part of the system.
The choice of which feature to discard determines the nature of the system. For example, one could sacrifice consistency to get a scalable, simple, and high performance database management system. Often, the main difference between a relational database and a NoSQL database is consistency. A relational database enforces ACID (atomicity, consistency, isolation, durability) properties. In contrast, many NoSQL databases adopt the basically available soft-state, eventual consistency (base) model.
NoSQL motivation
A NoSQL database provides a means for data storage, manipulation, and retrieval for non-relational data. The NoSQL databases are distributed, open source, and horizontally scalable. NoSQL often adopts the base model, which prizes availability over consistency, and informally guarantees that if no new updates are made on a data item, eventually all access to that data item will return the latest version of that data item. The advantages of this approach include the following:
- Simplicity of design
- Horizontal scaling and easy replication
- Schema free
- Huge amount of data support
We will now explore a few types of NoSQL databases.
Key-value databases
The key-value store is the simplest database store. In this database model, the storage, as its name suggests, is based on maps or hash tables. Some key-value databases allow complex values to be stored as lists and hash tables. Key-value pairs are extremely fast for certain scenarios, but lack the support for complex queries and aggregation. Some of the existing open source key-value databases are Riak, Redis, Memebase, and MemcacheDB.
Columnar databases
Columnar or column-oriented databases are based on columns. Data in a certain column in a two-dimensional relation is stored together.
HBase is one of the most famous columnar databases. It is based on the Google Bigtable storage system. Column-oriented databases are designed for huge data scenarios, so they scale up easily. For small datasets, HBase is not a suitable architecture. First, the recommended hardware topology for HBase is a five-node or server deployment. Also, it needs a lot of administration and is difficult to master and learn.
Document databases
A document-oriented database is suitable for documents and semi-structured data. The central concept of a document-oriented database is the notion of a document. Documents encapsulate and encode data (or information) in some standard formats or encodings such as XML, JSON, and BSON. Documents do not adhere to a standard schema or have the same structure, so they provide a high degree of flexibility. Unlike relational databases, changing the structure of the document is simple and does not lock the clients from accessing the data.
Document databases merge the power of relational databases and column-oriented databases. They provide support for ad hoc queries and can be scaled up easily. Depending on the design of the document database, MongoDB is designed to handle a huge amount of data efficiently. On the other hand, CouchDB provides high availability even in the case of hardware failure.
Graph databases
Graph databases are based on the graph theory, where a database consists of nodes and edges. The nodes as well as the edges can be assigned data. Graph databases allow traversing between the nodes using edges. As a graph is a generic data structure, graph databases are capable of representing different data. A famous implementation of an open source commercially supported graph database is Neo4j.