Sharding recovery
In this section, we will explore different failure types and how we can recover in a sharded environment. Failure in a distributed system can take multiple forms and shapes. In this section, we will cover all possible failure cases, as outlined here:
- A
mongos
process breaks - A
mongod
process breaks - A config server breaks
- A shard goes down
- The entire cluster goes down
In the following sections, we will describe how each component failure affects our cluster and how to recover.
mongos
The mongos
process is a relatively lightweight process that holds no state. In the case that the process fails, we can just restart it or spin up a new process on a different server. It’s recommended that mongos
processes are located in the same server as our application, and so it makes sense to connect from our application using the set of mongos
servers that we have colocated in our application servers to ensure high availability (HA) of mongos...