You're reading from Mastering MongoDB 7.0 Achieve data excellence by unlocking the full potential of MongoDB

Product type Paperback

Published in Jan 2024

Publisher Packt

ISBN-13 9781835460474

Length 434 pages

Edition 4th Edition

Languages

JavaScript

Tools

MongoDB

Concepts

Databases

Authors (7):

Marko Aleksendrić

Arek Borucki

Leandro Domingues

Malak Abu Hammad

Elie Hannouch

Rajesh Nair

Rachelle Palmer

+3 more

View More author details

Table of Contents (20) Chapters

Preface

1. Chapter 1: Introduction to MongoDB FREE CHAPTER

2. Chapter 2: The MongoDB Architecture

3. Chapter 3: Developer Tools

4. Chapter 4: Connecting to MongoDB

5. Chapter 5: CRUD Operations and Basic Queries

6. Chapter 6: Schema Design and Data Modeling

7. Chapter 7: Advanced Querying in MongoDB

8. Chapter 8: Aggregation

9. Chapter 9: Multi-Document ACID Transactions

10. Chapter 10: Index Optimization

11. Chapter 11: MongoDB Atlas: Powering the Future of Developer Data Platforms

12. Chapter 12: Monitoring and Backup in MongoDB

13. Chapter 13: Introduction to Atlas Search

14. Chapter 14: Integrating Applications with MongoDB

15. Chapter 15: Security

16. Chapter 16: Auditing

17. Chapter 17: Encryption

18. Index

Why subscribe?

19. Other Books You May Enjoy

Efficiency of the inherent complexity of MongoDB databases

The most interesting part of the modern database is understanding its architecture and why it's built that way. Fundamentally, MongoDB is a distributed system. The database server itself was originally built with the anticipation that most users would run it with a default configuration—replica set, sometimes also referred to as a cluster. When you explore this architecture in-depth, you'll notice the true complexities.

By default, replica set of MongoDB is a three-node configuration. All three nodes are data-bearing, which means that there is a complete copy of the database available on each node. Each database is hosted on a separate instance or host, which can be in the same availability zone, data center, or region. This default configuration is to ensure both redundancy and high availability. Chapter 2, The MongoDB Architecture will discuss replica sets in more detail.

If one of the instances becomes unresponsive or unavailable, a healthy node is promoted to become the primary node. This failover between members occurs automatically, and there's no impact on operations for the users of the database. This process considers many different factors, including node availability, data freshness, and responsiveness. This election process and protocol, while simple to understand at a high-level, is very nuanced. But since the operations continue without interruption, you hardly know or understand these details.

How is this possible?

Behind the scenes, write operations to MongoDB are propagated from the primary node to the secondary nodes via a process called replication. The best way to explain replication is with the example of a single write to the database. An inbound write from the client application (your app) will be first directed to the primary node. That primary node will apply the write to its copy of the database. Then, the write is recorded in the operations log (oplog), which is tailed by secondary nodes.

Replication in MongoDB is based on the RAFT consensus protocol. One particular example of how this implementation varies is leader elections. In the traditional RAFT protocol, leader and primary node election occurs through a combination of randomized election timeouts and message exchanges. In MongoDB, there are settings for node priority. This priority is considered along with data freshness and response time when electing a primary node.

It is often true that the write operation is not written simultaneously to all nodes—there is a lag heavily influenced by factors such as network latency, the distance between nodes, hardware configuration, and workload. If one of the mongod nodes falls behind, it will catch up or resync itself when it is able to do so using the oplog to determine the gaps in its operations. The MongoDB system monitors the replication lag between nodes to track this metric and assess whether the delay between primary and secondary nodes is acceptable, and if not, takes necessary action. This process is unique among databases as well.

This default configuration of MongoDB is a replica set with three members, where replication of data between nodes and failover between nodes are all handled automatically. This configuration is both durable and highly available, which makes it easy to use. For developers who require larger, global deployments, MongoDB has a sharded cluster model. The first thing to understand is that a sharded cluster consists of replica sets. It is a way of further dividing your data into effectively replicated partitions.

Figure 1.1: Replicated partitions set with primary and secondary nodes

If you require a global deployment with multiple terabytes of data, get started with Chapter 2, The MongoDB Architecture. It will cover how to split data, how to migrate data between regions or shards, how to marry data from multiple regions for analytics, and the performance of sharded cluster architectures.

You're reading from Mastering MongoDB 7.0 Achieve data excellence by unlocking the full potential of MongoDB

Table of Contents (20) Chapters

Efficiency of the inherent complexity of MongoDB databases

How is this possible?

Authors (7)

Personalised recommendations for you