Autoscaling the deployed models
While creating a model server, you will be presented with the option to set the number of replicas. This corresponds to the number of instances of the model servers to be created. This allows you to increase or decrease the serving capacity of your model servers. Figure 5.12 shows this option as Model server replicas:
Figure 5.12 – Add model server
However, with this approach, you need to decide on the number of serving instances or replicas at the time of the model server’s creation. OpenShift provides another construct where you can add an automatic scaler that increases or decreases the number of replicas of the model server based on the memory or CPU utilization of the model server instances. This construct is called horizontal pod autoscaling. This allows us to automatically scale workloads to match the demand.
Let’s see how the model server that we defined with the data science project is deployed...