High availability, scalability, and capacity planning
Highly available systems must also be scalable. The load on most complicated distributed systems can vary dramatically based on the time of day, weekdays versus weekends, seasonal effects, marketing campaigns, and many other factors. Successful systems will have more users over time and accumulate more and more data. That means that the physical resources of the clusters—mostly nodes and storage—will have to grow over time too. If your cluster is under-provisioned, it will not be able to satisfy all the demand and it will not be available because requests will time out or be queued up and not processed fast enough.
This is the realm of capacity planning. One simple approach is to over-provision your cluster. Anticipate the demand and make sure you have enough of a buffer for spikes of activity. But be aware that this approach suffers from several deficiencies:
- For highly dynamic and complicated distributed...