In this chapter, we looked at the various ways to deploy a trained model for NLP tasks. First, we learned about improving the performance of models by quantization, and we learned faster inference methods. Following that, we saw how TensorFlow Serving can be used to deploy models for faster and scalable inference. Finally, cloud deployment through AWS and GCP was explained. We concluded with a brief overview of deployment in some mobile platforms.
In this final chapter, we gave an overview of deploying trained models and serving them in the cloud. Equipped with this knowledge, you can further explore how to deploy your own models to production environments.