Understanding failure modes for distributed applications
Kubernetes components (and applications running on Kubernetes) are distributed by default if they run more than one replica. This can result in some interesting failure modes, which can be hard to debug.
For this reason, applications on Kubernetes are less prone to failure if they are stateless – in which case, the state is offloaded to a cache or database running outside of Kubernetes. Kubernetes primitives such as StatefulSets and PersistentVolumes can make it much easier to run stateful applications on Kubernetes – and with every release, the experience of running stateful applications on Kubernetes improves. Still, deciding to run fully stateful applications on Kubernetes introduces complexity and therefore the potential for failure.
Failure in distributed applications can be introduced by many different factors. Things as simple as network reliability and bandwidth constraints can cause major issues....