TensorFlow Serving (TFS) is a high-performance server architecture for serving the machine learning models in production. It offers out-of-the-box integration with the models built using TensorFlow.
In TFS, a model is composed of one or more servables. A servable is used to perform computation, for example:
- A lookup table for embedding lookups
- A single model returning predictions
- A tuple of models returning a tuple of predictions
- A shard of lookup tables or models
The manager component manages the full lifecycle for the servables including loading/unloading a servable and serving the servable.
The internal architecture and workflow of TensorFlow Serving is described at the following link: https://www.tensorflow.org/serving/architecture_overview.