You're reading from Learn TensorFlow Enterprise Build, manage, and scale machine learning workloads seamlessly using Google's TensorFlow Enterprise

Product type Paperback

Published in Nov 2020

Publisher Packt

ISBN-13 9781800209145

Length 314 pages

Edition 1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Artificial Intelligence

Author (1):

KC Tung

View More author details

Table of Contents (15) Chapters

Preface

1. Section 1 – TensorFlow Enterprise Services and Features

2. Chapter 1: Overview of TensorFlow Enterprise FREE CHAPTER

3. Chapter 2: Running TensorFlow Enterprise in Google AI Platform

4. Section 2 – Data Preprocessing and Modeling

5. Chapter 3: Data Preparation and Manipulation Techniques

6. Chapter 4: Reusable Models and Scalable Data Pipelines

7. Section 3 – Scaling and Tuning ML Works

8. Chapter 5: Training at Scale

9. Chapter 6: Hyperparameter Tuning

10. Section 4 – Model Optimization and Deployment

11. Chapter 7: Model Optimization

12. Chapter 8: Best Practices for Model Training and Performance

13. Chapter 9: Serving a TensorFlow Model

14. Other Books You May Enjoy

Leave a review - let other readers know what you think

Understanding the quantization concept

Quantization is a technique whereby the model size is reduced and its efficiency therefore improved. This technique is helpful in building models for mobile or edge deployment, where compute resources or power supply are constrained. Since our aim is to make the model run as efficiently as possible, we are also accepting the fact that the model has to become smaller and therefore less precise than the original model. This means that we are transforming the model into a lighter version of its original self, and that the transformed model is an approximation of the original one.

Quantization may be applied to a trained model. This is known as a post-training quantization API. Within this type of quantization, there are three approaches:

Reduced float quantization: Convert float 32 bits ops to float 16 ops.
Hybrid quantization: Convert weights to 8 bits, while keeping biases and activation as 32 bits ops.
Integer quantization...