Summary
In this chapter, we have gone through three different approaches to take PyTorch to production, starting from the easiest but least performant way: using Flask. Then we moved to the MXNet model server, which is a pre-built, optimized server implementation that can be managed using management APIs. The MXNet model server is useful for people who don't need a lot of complexity but need an efficient server implementation that can be scaled as required.
Lastly, we tried with TorchScript to create the most efficient version of our model and imported that in C++. For those who are ready to take up the complexity of building and maintaining a low-level language server like C++, Go, or Rust, you can take this approach and build a custom server until we have better runtime available that can read the script module and serve that like MXNet does on ONNX models.
The year 2018 was the year of model servers; there were numerous model servers from different organizations with...