If we need to design highly available and scalable architecture, we have to make sure that servers should be able to handle the increase in load whenever the necessity might arise.
So, in short, if at 10 am on any given day, if there is a huge spike in traffic, your environment should be able to take the load and serve traffic efficiently.
Let's look at a sample architecture for the same:
The preceding architecture is based on an autoscaling-based approach. We have a load balancer at the top that distributes traffic to servers beneath it. In this type of architecture, the servers will grow horizontally whenever there is an increase in load. This functionality of scaling on demand is denoted by the Scaled systems in the diagram.