In this chapter, we covered several topics on performance. First, we learned how to properly measure the inference speed of a model, and then we went through techniques to reduce inference time: choosing the right hardware and the right libraries, optimizing input size, and optimizing post-processing. We covered techniques to make a slower model appear, to the user, as if it were processing in real time, and to reduce the model size.
Then, we introduced on-device ML, along with its benefits and limitations. We learned how to convert TensorFlow and Keras models to a format that's compatible with on-device deep learning frameworks. With examples on iOS and Android, and in the browser, we covered a wide range of devices. We also introduced some existing embedded devices.
Throughout this book, we have presented TensorFlow 2 in detail, applying it to multiple computer vision tasks. We have covered a variety of state-of-the-art solutions, providing both a theoretical background...