Deploying a Deep RL agent as a service
Once you train your RL agent to solve a problem or business need, you will want to deploy it as a service – more likely than offering the trained agent model as a product due to several reasons, including scalability and model-staleness limitations. You will want to have a way to update the agent model with new versions and you will not want to maintain or offer support for multiple versions or older versions of your agent if you sell it as a product. You will need a solid and well-tested mechanism to offer your RL agent as an AI service that allows customizable runtimes (different frameworks, and CPU/GPU support), easy model upgrades, logging, performance monitoring, and so on.
To serve all such needs, we will be using NVIDIA's Triton server as the backend for serving our agent as a service. Triton serves as a unifying inference framework for the deployment of AI models at scale in production. It supports a wide variety of deep...