Efficiency with TorchScript
We have set up the simple Flask application server to serve our model and we have implemented the same model using the MXNet model server, but if we need to go away from the Python world and make a highly efficient server in C++ or Go, or in other efficient languages, PyTorch came up with TorchScript, which can generate the most efficient form your model, which is readable in C++.
Now the question is: isn't this what we did with ONNX; that is, creating another IR from the PyTorch model? Yes, the processes are similar, but the difference here is that ONNX creates the optimized IR using tracing; that is, it passes a dummy input through the model and while the model is being executed, it records the PyTorch operation and then converts those operations to intermediate IR.
There is a problem with this approach: if the model is data-dependent, like loops in RNNs, or if the if
/else
condition is based on the input, then tracing can't really get that right...