Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering MongoDB 3.x

You're reading from   Mastering MongoDB 3.x An expert's guide to building fault-tolerant MongoDB applications

Arrow left icon
Product type Paperback
Published in Nov 2017
Publisher Packt
ISBN-13 9781783982608
Length 342 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Alex Giamas Alex Giamas
Author Profile Icon Alex Giamas
Alex Giamas
Arrow right icon
View More author details
Toc

SQL and NoSQL evolution

Structured Query Language existed even before the WWW. Dr. EF Codd originally published the paper A Relational Model of Data for Large Shared Data Banks, in June 1970, in the Association of Computer Machinery (ACM) journal, Communications of the ACM. SQL was initially developed at IBM by Chamberlin and Boyce in 1974. Relational Software (now Oracle Corporation) was the first to develop a commercially available implementation of SQL, targeted at United States governmental agencies.

The first American National Standards Institute (ANSI) SQL standard came out in 1986 and since then there have been eight revisions with the most recent being published in 2016 (SQL:2016).

SQL was not particularly popular at the start of the WWW. Static content could just be hard coded into the HTML page without much fuss. However, as functionality of websites grew, webmasters wanted to generate web page content driven by offline data sources to generate content that could change over time without redeploying code.

Common Gateway Interface (CGI) scripts in Perl or Unix shell were driving early database driven websites in Web 1.0. With Web 2.0, the web evolved from directly injecting SQL results into the browser to using two- and three-tier architecture that separated views from business and model logic, allowing for SQL queries to be modular and isolated from the rest of a web application.

Not only SQL (NoSQL) on the other hand is much more modern and supervenes web evolution, rising at the same time as Web 2.0 technologies. The term was first coined by Carlo Strozzi in 1998 for his open source database that was not following the SQL standard but was still relational.

This is not what we currently expect from a NoSQL database. Johan Oskarsson, a developer at Last.fm at the time, reintroduced the term in early 2009 to group a set of distributed, non-relational data stores that were being developed. Many of them were based on Google's Bigtable and MapReduce papers or Amazon's Dynamo highly available key-value based storage system.

NoSQL foundations grew upon relaxed ACID (atomicity, consistency, isolation, durability) guarantees in favor of performance, scalability, flexibility and reduced complexity. Most NoSQL databases have gone one way or another in providing as many of the previously mentioned qualities as possible, even offering tunable guarantees to the developer.

Timeline of SQL and NoSQL evolution

MongoDB evolution

10gen started developing a cloud computing stack in 2007 and soon realized that the most important innovation was centered around the document oriented database that they built to power it, MongoDB. MongoDB was initially released on August 27th, 2009.

Version 1 of MongoDB was pretty basic in terms of features, authorization, and ACID guarantees and made up for these shortcomings with performance and flexibility.

In the following sections, we can see the major features along with the version number with which they were introduced.

Major feature set for versions 1.0 and 1.2

  • Document-based model
  • Global lock (process level)
  • Indexes on collections
  • CRUD operations on documents
  • No authentication (authentication was handled at the server level)
  • Master/slave replication
  • MapReduce (introduced in v1.2)
  • Stored JavaScript functions (introduced in v1.2)

Version 2

  • Background index creation (since v.1.4)
  • Sharding (since v.1.6)
  • More query operators (since v.1.6)
  • Journaling (since v.1.8)
  • Sparse and covered indexes (since v.1.8)
  • Compact command to reduce disk usage
  • Memory usage more efficient
  • Concurrency improvements
  • Index performance enhancements
  • Replica sets are now more configurable and data center aware
  • MapReduce improvements
  • Authentication (since 2.0 for sharding and most database commands)
  • Geospatial features introduced

Version 3

  • Aggregation framework (since v.2.2) and enhancements (since v.2.6)
  • TTL collections (since v.2.2)
  • Concurrency improvements among which DB level locking (since v.2.2)
  • Text search (since v.2.4) and integration (since v.2.6)
  • Hashed index (since v.2.4)
  • Security enhancements, role based access (since v.2.4)
  • V8 JavaScript engine instead of SpiderMonkey (since v.2.4)
  • Query engine improvements (since v.2.6)
  • Pluggable storage engine API
  • WiredTiger storage engine introduced, with document level locking while previous storage engine (now called MMAPv1) supports collection level locking

Version 3+

  • Replication and sharding enhancements (since v.3.2)
  • Document validation (since v.3.2)
  • Aggregation framework enhanced operations (since v.3.2)
  • Multiple storage engines (since v.3.2, only in Enterprise Edition)

MongoDB evolution diagram

As one can observe, version 1 was pretty basic, whereas version 2 introduced most of the features present in the current version such as sharding, usable and special indexes, geospatial features, and memory and concurrency improvements.

On the way from version 2 to version 3, the aggregation framework was introduced, mainly as a supplement to the ageing (and never up to par with dedicated frameworks like Hadoop) MapReduce framework. Then, adding text search and slowly but surely improving performance, stability, and security to adapt to the increasing enterprise load of customers using MongoDB.

With WiredTiger's introduction in version 3, locking became much less of an issue for MongoDB as it was brought down from process (global lock) to document level, almost the most granular level possible.

At its current state, MongoDB is a database that can handle loads ranging from startup MVPs and POCs to enterprise applications with hundreds of servers.

MongoDB for SQL developers

MongoDB was developed in the Web 2.0 era. By then, most developers had been using SQL or Object-relational mapping (ORM) tools from their language of choice to access RDBMS data. As such, these developers needed an easy way to get acquainted with MongoDB from their relational background.

Thankfully, there have been several attempts at SQL to MongoDB cheat sheets that explain MongoDB terminology in SQL terms.

On a higher level there are:

  • Databases, indexes just like in SQL databases
  • Collections (SQL tables)
  • Documents (SQL rows)
  • Fields (SQL columns)
  • Embedded and linked documents (SQL Joins)

Some more examples of common operations:

SQL

MongoDB

Database

Database

Table

Collection

Index

Index

Row

Document

Column

Field

Joins

Embed in document or link via DBRef

CREATE TABLE employee (name VARCHAR(100))

db.createCollection("employee")

INSERT INTO employees VALUES (Alex, 36)

db.employees.insert({name: "Alex", age: 36})

SELECT * FROM employees

db.employees.find()

SELECT * FROM employees LIMIT 1

db.employees.findOne()

SELECT DISTINCT name FROM employees

db.employees.distinct("name")

UPDATE employees SET age = 37 WHERE name = 'Alex'

db.employees.update({name: "Alex"}, {$set: {age: 37}}, {multi: true})

DELETE FROM employees WHERE name = 'Alex'

db.employees.remove({name: "Alex"})

CREATE INDEX ON employees (name ASC)

db.employees.ensureIndex({name: 1})

MongoDB for NoSQL developers

As MongoDB has grown from being a niche database solution to the Swiss Army knife of NoSQL technologies, more developers are coming to it from a NoSQL background as well.

Setting the SQL to NoSQL differences aside, users from columnar type databases face the most challenges. Cassandra and HBase being the most popular column oriented database management systems, we will examine the differences and how a developer can migrate a system to MongoDB.

  • Flexibility: MongoDB's notion of documents that can contain sub-documents nested in complex hierarchies is really expressive and flexible. This is similar to the comparison between MongoDB and SQL, with the added benefit that MongoDB can map easier to plain old objects from any programming language, allowing for easy deployment and maintenance.
  • Flexible query model: A user can selectively index some parts of each document, query based on attribute values, regular expressions or ranges, and have as many properties per object as needed by the application layer. Primary, secondary indexes as well as special types of indexes like sparse ones can help greatly with query efficiency. Using a JavaScript shell with MapReduce makes it really easy for most developers and many data analysts to quickly take a look into data and get valuable insights.
  • Native aggregation: The aggregation framework provides an ETL pipeline for users to extract and transform data from MongoDB and either load them in a new format or export it from MongoDB to other data sources. This can also help data analysts and scientists get the slice of data they need performing data wrangling along the way.
  • Schemaless model: This is a result of MongoDB's design philosophy to give applications the power and responsibility to interpret different properties found in a collection's documents. In contrast to Cassandra's or HBase's schema based approach, in MongoDB a developer can store and process dynamically generated attributes.
You have been reading a chapter from
Mastering MongoDB 3.x
Published in: Nov 2017
Publisher: Packt
ISBN-13: 9781783982608
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image