You're reading from The Deep Learning Architect's Handbook Build and deploy production-ready DL solutions leveraging the latest Python techniques

Product type Paperback

Published in Dec 2023

Publisher Packt

ISBN-13 9781803243795

Length 516 pages

Edition 1st Edition

Languages

Python

Tools

BERT

Concepts

Deep Learning

Author (1):

Ee Kin Chin

View More author details

Table of Contents (25) Chapters

Preface

1. Part 1 – Foundational Methods

2. Chapter 1: Deep Learning Life Cycle FREE CHAPTER

3. Chapter 2: Designing Deep Learning Architectures

4. Chapter 3: Understanding Convolutional Neural Networks

5. Chapter 4: Understanding Recurrent Neural Networks

6. Chapter 5: Understanding Autoencoders

7. Chapter 6: Understanding Neural Network Transformers

8. Chapter 7: Deep Neural Architecture Search

9. Chapter 8: Exploring Supervised Deep Learning

10. Chapter 9: Exploring Unsupervised Deep Learning

11. Part 2 – Multimodal Model Insights

12. Chapter 10: Exploring Model Evaluation Methods

13. Chapter 11: Explaining Neural Network Predictions

14. Chapter 12: Interpreting Neural Networks

15. Chapter 13: Exploring Bias and Fairness

16. Chapter 14: Analyzing Adversarial Performance

17. Part 3 – DLOps

18. Chapter 15: Deploying Deep Learning Models to Production

19. Chapter 16: Governing Deep Learning Models

20. Chapter 17: Managing Drift Effectively in a Dynamic Environment

21. Chapter 18: Exploring the DataRobot AI Platform

22. Chapter 19: Architecting LLM Solutions

23. Index

Why subscribe?

24. Other Books You May Enjoy

Deploying a language model with ONNX, TensorRT, and NVIDIA Triton Server

The three tools are ONNX, TensorRT, and NVIDIA Triton Server. ONNX and TensorRT are meant to perform GPU-based inference acceleration, while NVIDIA Triton Server is meant to host HTTP or GRPC APIs. We will explore these three tools practically in this section. TensorRT is known to perform the best model optimization toward the GPU to speed up inference, while NVIDIA Triton Server is a battle-tested tool for hosting DP models that have compatibility with TensorRT natively. ONNX, on the other hand, is an intermediate framework in the setup, which we will use primarily to host the weight formats that are directly supported by TensorRT.

In this practical tutorial, we will be deploying a Hugging Face-sourced language model that can be supported on most NVIDIA GPU devices. We will be converting our PyTorch-based language model from Hugging Face into ONNX weights, which will allow TensorRT to load the Hugging Face...

The rest of the chapter is locked

You're reading from The Deep Learning Architect's Handbook Build and deploy production-ready DL solutions leveraging the latest Python techniques

Table of Contents (25) Chapters

Deploying a language model with ONNX, TensorRT, and NVIDIA Triton Server

Authors (1)

Personalised recommendations for you

You're reading from The Deep Learning Architect's Handbook Build and deploy production-ready DL solutions leveraging the latest Python techniques

Table of Contents (25) Chapters

Deploying a language model with ONNX, TensorRT, and NVIDIA Triton Server

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you