In this chapter, you will learn how to deploy trained deep learning models to production environments on various platforms, such as cloud and mobile. For cloud deployment, the latency and throughput are important. The latency has to be at a minimum, and the throughput has to be high. The performance largely depends on the model and hardware. There are several optimizations available for CPU and GPU. For mobile platforms, speed, and energy consumption are important.
In this chapter, you will learn techniques to meet your deployment goals through the following topics:
- Increasing performance by changing models
- Using the TensorFlow serving tool
- Deploying to cloud services, such as AWS, GCP, and Azure
- Deploying to mobile devices, such as iPhone, Android, and Tegra
- The impact of hardware on performance