Summary
In this chapter, we explored a popular tool for serving models. TensorFlow Serving can be used to serve any TensorFlow model, and the serving architecture handles versioning issues with high performance. We also introduced you to the architecture of TensorFlow Serving at a high level. Then, we served a model provided by the TensorFlow repository through Docker and explained how model serving is done step by step. Finally, we demonstrated how can we call the API to get the predictions from Postman.
In the next chapter, we will talk about another popular model-serving tool known as Ray Serve.