As shown in the diagram at the beginning of this chapter, SavedModel is the input for a vast ecosystem of deployment platforms, with each one being created to satisfy a different range of use cases:
- TensorFlow Serving: This is the official Google solution for serving machine learning models. It supports model versioning, multiple models can be deployed in parallel, and it ensures that concurrent models achieve high throughput with low latency thanks to its complete support for hardware accelerators (GPUs and TPUs). TensorFlow Serving is not merely a deployment platform, but an entire ecosystem built around TensorFlow and written in highly efficient C++ code. Currently, this is the solution Google itself uses to run tens of millions of inferences per second on Google Cloud's ML platform.
- TensorFlow Lite: This is the deployment platform of choice...