Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering RethinkDB

You're reading from   Mastering RethinkDB Master the skills of building real-time apps dramatically easier with open source, scalable database - RethinkDB

Arrow left icon
Product type Paperback
Published in Dec 2016
Publisher Packt
ISBN-13 9781786461070
Length 330 pages
Edition 1st Edition
Arrow right icon
Author (1):
Arrow left icon
Shahid Shaikh Shahid Shaikh
Author Profile Icon Shahid Shaikh
Shahid Shaikh
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. The RethinkDB Architecture and Data Model 2. RethinkDB Query Language FREE CHAPTER 3. Data Exploration Using RethinkDB 4. Performance Tuning in RethinkDB 5. Administration and Troubleshooting Tasks in RethinkDB 6. RethinkDB Deployment 7. Extending RethinkDB 8. Full Stack Development with RethinkDB 9. Polyglot Persistence Using RethinkDB 10. Using RethinkDB and Horizon

Query execution in RethinkDB

RethinkDB query engine is a very critical and important part of RethinkDB. RethinkDB performs various computations and internal logic operations to maintain high performance along with good throughput of the system.

Refer to the following diagram to understand query execution:

Query execution in RethinkDB

RethinkDB, upon arrival of a query, divides it into various stacks. Each stack contains various methods and internal logic to perform its operation. Each stack consists of various methods, but there are three core methods that play key roles:

  • The first method decides how to execute the query or subset of the query on each server in a particular cluster
  • The second method decides how to merge the data coming from various clusters in order to make sense of it
  • The third method, which is very important, deals with transmission of that data in streams rather than as a whole

To speed up the process, these stacks are transported to every related server and each server begins to evaluate it in parallel to other servers. This process runs recursively in order to merge the data to stream to the client.

The stack in the node grabs the data from the stack after it and performs its own method of execution and transformation. The data from each server is then combined into a single result set and streamed to the client.

In order to speed up the process and maintain high performance, every query is completely parallelized across various relevant clusters. Thus, every cluster then performs the query execution and the data is again merged together to make a single result set.

RethinkDB query engine maintains efficiency in the process too; for example, if a client only requests a certain result that is not in a shared or replicated server, it will not execute the parallel operation and just return the result set. This process is also referred to as lazy execution.

To maintain concurrency and high performance of query execution, RethinkDB uses block-level Multiversion Concurrency Control (MVCC). If one user is reading some data while other users are writing on it, there is a high chance of inconsistent data, and to avoid that we use a concurrency control algorithm. One of the simplest and commonly used methods method by SQL databases is to lock the transaction, that is, make the user wait if a write operation is being performed on the data. This slows down the system, and since big data promises fast reading time, this simply won't work.

Multiversion concurrency control takes a different approach. Here each user will see the snapshot of the data (that is, child copies of master data), and if there are some changes going on in the master copy, then the child copies or snapshot will not get updated until the change has been committed:

Query execution in RethinkDB

RethinkDB does use block-level MVCC and this is how it works. Whenever there is any update or write operation being performed during the read operation, RethinkDB takes a snapshot of each shard and maintains a different version of a block to make sure every read and write operation works in parallel. RethinkDB does use exclusive locks on block level in case of multiple updates happening on the same document. These locks are very short in duration because they all are cached; hence it always seems to be lock-free.

RethinkDB provides atomicity of data as per the JSON document. This is different from other NoSQL systems; most NoSQL systems provide atomicity to each small operation done on the document before the actual commit. RethinkDB does the opposite, it provides atomicity to a document no matter what combination of operations is being performed.

For example, a user may want to read some data (say, the first name from one document), change it to uppercase, append the last name coming from another JSON document, and then update the JSON document. All of these operations will be performed automatically in one update operation.

RethinkDB limits this atomicity to a few operations. For example, results coming from JavaScript code cannot be performed atomically. The result of a subquery is also not atomic. Replace cannot be performed atomically.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image