We explored quite a few techniques, processes, and tools that can help us build a self-sufficient system applied to services. Docker Swarm provides self-healing, and we created our own system for self-adaptation. By now, we should be fairly confident with our services and the time has come to explore how to accomplish similar goals applied to infrastructure.
The system should be capable of recreating failed nodes, of upgrading without downtime, and to scale servers depending on the fluctuating needs. We cannot explore those topics using clusters based on Docker machine nodes running locally. The capacity of our laptops is somewhat limited so we cannot scale nodes to a greater number. Even if we could, the infrastructure we'll use for production clusters is quite different. We'll need an API that will allow our system to communicate with...