The knowledge about distributed systems, due to their nature, is often distributed itself. Different people are responsible for the development, configuration, deployment, and administration of such systems and their infrastructure. Different components are often upgraded by different people, not necessarily in sync. There's also the so-called bus factor, which in short is the risk factor for a key project member being hit by a bus.
How do we deal with all of this? The answer consists of a few parts. One of them is the DevOps culture. By facilitating close collaboration between development and operations, people share the knowledge about the system, thus reducing the bus factor. Introducing continuous delivery can help with upgrading the project and keeping it always up.
Try to model your system to be loosely coupled and backward compatible, so upgrades of components don't require other components to be upgraded too. An easy way to decouple is by introducing...