The history and evolution of databases
A database is a collection of data that can be organized, managed, modified, and retrieved using a computer. The system that helps with managing data in a database is called a database management system (DBMS).
In the 1950s and 1960s, several advancements were made in terms of processors, storage, memory, and networks. We also had our first programming languages, COBOL and FORTRAN. The development of hard disk drives for data storage further spurred the development of databases. Around the same time, the first notion of a modern-day computer with a mouse and graphical user interface came into existence, making it easy for the general public to consume it. In this section, we will discuss how various types of databases evolved.
SQL
The first database was designed by Charles William Bachman III, an American computer scientist. In 1963, he developed the Integrated Data Store (IDS), which gave rise to the concept of the navigational database. In navigational databases, we can find records by chasing references from other objects. For example, let's say that in a school database, you want to find all the students from a specific grade in a specific school. In a navigational database, first, you have to go to the group of students that belong to a particular school and then to the group that belongs to a particular grade. So, records can be accessed by hierarchical navigation. Based on IDS, Bachman later developed the CODASYL database model in 1969. CODASYL stands for Conference/Committee on Data Systems Languages, which was a consortium to guide the development of programming languages. Around the same time Edgar F. Codd, an IBM employee, developed the IBM Information Management System (IMS), which was based on the hierarchical database model. A hierarchical database model is a data model in which the data is designed in a tree-like structure. In 1970, Donald D. Chamberlin and Raymond F. Boyce developed Structured Query Language (SQL) based on what they'd learned about IMS. They initially called it Structured English Query Language (SEQUEL), which System R was later developed with by a group at the IBM San Jose research laboratory. In 1976, QUEL, which is a relational database query language designed by Michael Ralph Stonebraker, was developed as part of the Interactive Graphics Retrieval System (INGRES) database management system at the University of California, Berkeley.
Based on QUEL and SQL, several databases were implemented. Some of the most prominent ones include Post Ingres (Postgres), Sybase, Microsoft SQL, IBM DB2, Oracle, MariaDB, and MySQL.
Object-oriented databases
In the 1980s, object-oriented database systems (OODBMSes) grew in popularity. In OODBMSes, information is represented as objects compared to tables in relational databases. Some of the important ones include Gemstone/S
, Objectivity/DB
, InterSystems Cache
, Perst
, ZODB
, Wakanda
, ObjectDB
, ODABA
, and Realm
.
NoSQL
The concept of non-SQL or non-relational databases has existed since the 1960s, but the term NoSQL became has much more popular in the last decade. NoSQL databases focus on performance and scaling and mostly rely on a non-relational data model such as a document, key-value, wide-column, or graph to organize the data. Some of the most popular ones in this category include Cassandra, MongoDB, Couchbase, Dynamo, FoundationDB, Neo4j, and Hbase.
NewSQL
With the introduction of the on-demand availability of compute, storage, and network resources and the pay-as-you-go model, which is collectively known as cloud computing, the amount of data that we collect, process, manage, and analyze has been growing exponentially. Although it was relatively easier for some of the NoSQL databases to adapt to the cloud, it is still much harder for traditional SQL databases to do so. Many of them are better suited for vertical scaling and do not consider geographically distributed data, the shared-nothing architecture, and enormous scale as part of their initial design. This created a void. We needed SQL databases that are cloud-native, scale well with data growth, and are easy to manage. Many companies developed in-house solutions on top of existing SQL databases:
- Facebook developed TAO, a NoSQL graph API built on top of sharded MySQL.
- YouTube developed Vitess to easily scale and manage MySQL clusters.
- Dropbox developed Edgestore, a metadata store to power their services and products, which again was built on top of MySQL.
- GreenPlum developed a massively parallel data platform by the same name for analytics, machine learning, and AI on top of Postgres.
However, it was still relatively hard and painful to manage the data as the underlying database was not built to scale.
In 2012, Google published a seminal paper on Google Spanner: a globally distributed database service and storage solution. Spanner essentially combined the important features of SQL databases such as ACID transactions, strongly consistent reads, and the SQL interface with some of the features that were only available with NoSQL databases, such as scaling across geographical locations, multi-site replication, and failover. It created a new category of databases called NewSQL, which is meant to indicate a combination of SQL features at NoSQL scale. YugabyteDB and CockroachDB were developed later, both of which got their inspiration from Google Spanner.