Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Production-Ready Applied Deep Learning

You're reading from   Production-Ready Applied Deep Learning Learn how to construct and deploy complex models in PyTorch and TensorFlow deep learning frameworks

Arrow left icon
Product type Paperback
Published in Aug 2022
Publisher Packt
ISBN-13 9781803243665
Length 322 pages
Edition 1st Edition
Tools
Arrow right icon
Authors (3):
Arrow left icon
Lenin Mookiah Lenin Mookiah
Author Profile Icon Lenin Mookiah
Lenin Mookiah
Tomasz Palczewski Tomasz Palczewski
Author Profile Icon Tomasz Palczewski
Tomasz Palczewski
Jaejun (Brandon) Lee Jaejun (Brandon) Lee
Author Profile Icon Jaejun (Brandon) Lee
Jaejun (Brandon) Lee
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Part 1 – Building a Minimum Viable Product
2. Chapter 1: Effective Planning of Deep Learning-Driven Projects FREE CHAPTER 3. Chapter 2: Data Preparation for Deep Learning Projects 4. Chapter 3: Developing a Powerful Deep Learning Model 5. Chapter 4: Experiment Tracking, Model Management, and Dataset Versioning 6. Part 2 – Building a Fully Featured Product
7. Chapter 5: Data Preparation in the Cloud 8. Chapter 6: Efficient Model Training 9. Chapter 7: Revealing the Secret of Deep Learning Models 10. Part 3 – Deployment and Maintenance
11. Chapter 8: Simplifying Deep Learning Model Deployment 12. Chapter 9: Scaling a Deep Learning Pipeline 13. Chapter 10: Improving Inference Efficiency 14. Chapter 11: Deep Learning on Mobile Devices 15. Chapter 12: Monitoring Deep Learning Endpoints in Production 16. Chapter 13: Reviewing the Completed Deep Learning Project 17. Index 18. Other Books You May Enjoy

Improving Inference Efficiency

When a deep learning (DL) model is deployed on an edge device, inference efficiency is often unsatisfactory. These issues mostly come from the size of the trained network, as it requires a lot of computation. Therefore, many engineers and scientists often sacrifice accuracy for speed when deploying a DL model on an edge device. Furthermore, they focus on reducing the model size as edge devices often have limited storage space.

In this chapter, we will introduce techniques for improving the inference latency while maintaining the original performance as much as possible. First, we will cover network quantization, a technique that decreases the network size by using data formats of lower precision for model parameters. Next, we will talk about weight sharing, which is also known as weight clustering. It is a very interesting concept where a few model weight values are shared across the whole network, reducing the necessary disk space to store the trained...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime