You're reading from Transformers for Natural Language Processing and Computer Vision Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3

Product type Paperback

Published in Feb 2024

Publisher Packt

ISBN-13 9781805128724

Length 730 pages

Edition 3rd Edition

Languages

Processing

Tools

Processing

Concepts

Computer Vision

Author (1):

Denis Rothman

View More author details

Table of Contents (24) Chapters

Preface

1. What Are Transformers?

2. Getting Started with the Architecture of the Transformer Model FREE CHAPTER

3. Emergent vs Downstream Tasks: The Unseen Depths of Transformers

4. Advancements in Translations with Google Trax, Google Translate, and Gemini

5. Diving into Fine-Tuning through BERT

6. Pretraining a Transformer from Scratch through RoBERTa

7. The Generative AI Revolution with ChatGPT

8. Fine-Tuning OpenAI GPT Models

9. Shattering the Black Box with Interpretable Tools

10. Investigating the Role of Tokenizers in Shaping Transformer Models

11. Leveraging LLM Embeddings as an Alternative to Fine-Tuning

12. Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4

13. Summarization with T5 and ChatGPT

14. Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2

15. Guarding the Giants: Mitigating Risks in Large Language Models

16. Beyond Text: Vision Transformers in the Dawn of Revolutionary AI

17. Transcending the Image-Text Boundary with Stable Diffusion

18. Hugging Face AutoTrain: Training Vision Models without Coding

19. On the Road to Functional AGI with HuggingGPT and its Peers

20. Beyond Human-Designed Prompts with Generative Ideation

21. Index

Appendix A: Revolutionizing AI: The Power of Optimized Time Complexity in Transformer Models

1. Appendix B: Answers to the Questions

What this book covers

Part I: The Foundations of Transformers

Chapter 1, What Are Transformers?, explains, at a high level, what transformers and Foundation Models are. We will first unveil the incredible power of the deceptively simple O(1) time complexity of transformer models that changed everything. We will continue to discover how a hardly known transformer algorithm in 2017 rose to dominate so many domains and brought us Foundation Models.

Chapter 2, Getting Started with the Architecture of the Transformer Model, goes through the background of NLP to understand how RNN, LSTM, and CNN architectures were abandoned and how the transformer architecture opened a new era. We will go through the Original Transformer’s architecture through the unique Attention Is All You Need approach invented by the Google Research and Google Brain authors. We will describe the theory of transformers. We will get our hands dirty in Python to see how multi-attention head sublayers work.

Chapter 3, Emergent vs Downstream Tasks: The Unseen Depths of Transformers, bridges the gap between the functional and mathematical architecture of transformers by introducing emergence. We will then see how to measure the performance of transformers before exploring several downstream tasks, such as the Stanford Sentiment TreeBank (SST-2), linguistic acceptability, and Winograd schemas.

Chapter 4, Advancements in Translations with Google Trax, Google Translate, and Gemini, goes through machine translation in three steps. We will first define what machine translation is. We will then preprocess a Workshop on Machine Translation (WMT) dataset. Finally, we will see how to implement machine translations.

Chapter 5, Diving into Fine-Tuning through BERT, builds on the architecture of the Original Transformer. Bidirectional Encoder Representations from Transformers (BERT) takes transformers into a vast new way of perceiving the world of NLP. Instead of analyzing a past sequence to predict a future sequence, BERT attends to the whole sequence! We will first go through the key innovations of BERT’s architecture and then fine-tune a BERT model by going through each step in a Google Colaboratory notebook. Like humans, BERT can learn tasks and perform other new ones without having to learn the topic from scratch.

Chapter 6, Pretraining a Transformer from Scratch through RoBERTa, builds a RoBERTa transformer model from scratch using the Hugging Face PyTorch modules. The transformer will be both BERT-like and DistilBERT-like. First, we will train a tokenizer from scratch on a customized dataset. Finally, we will put the knowledge acquired in this chapter to work and pretrain a Generative AI customer support model on X (formerly Twitter) data.

Part II: The Rise of Suprahuman NLP

Chapter 7, The Generative AI Revolution with ChatGPT, goes through the tremendous improvements and diffusion of ChatGPT models into the everyday lives of developers and end-users. We will first examine the architecture of OpenAI’s GPT models before working with the GPT-4 API and its hyperparameters to implement several NLP examples. Finally, we will learn how to obtain better results with Retrieval Augmented Generation (RAG). We will implement an example of automated RAG with GPT-4.

Chapter 8, Fine-Tuning OpenAI GPT Models, explores fine-tuning to make sense of the choices we can make for a project to go in this direction or not. We will introduce risk management perspectives. We will prepare a dataset and fine-tune a cost-effective babbage-02 model for a completion task.

Chapter 9, Shattering the Black Box with Interpretable Tools, lifts the lid on the black box that is transformer models by visualizing their activity. We will use BertViz to visualize attention heads, Language Interpretability Tool (LIT) to carry out a Principal Component Analysis (PCA), and LIME to visualize transformers via dictionary learning. OpenAI LLMs will take us deeper and visualize the activity of a neuron in a transformer with an interactive interface. This approach opens the door to GPT-4 explaining a transformer, for example.

Chapter 10, Investigating the Role of Tokenizers in Shaping Transformer Models, introduces some tokenizer-agnostic best practices to measure the quality of a tokenizer. We will describe basic guidelines for datasets and tokenizers from a tokenization perspective. We will explore word and subword tokenizers and show how a tokenizer can shape a transformer model’s training and performance. Finally, we will build a function to display and control token-ID mappings.

Chapter 11, Leveraging LLM Embeddings as an Alternative to Fine-Tuning, explains why searching with embeddings can sometimes be a very effective alternative to fine-tuning. We will go through the advantages and limits of this approach. We will go through the fundamentals of text embeddings. We will build a program that reads a file, tokenizes it, and embeds it with Gensim and Word2Vec. We will implement a question-answering program on sports events and use OpenAI Ada to embed Amazon fine food reviews. By the end of the chapter, we will have taken a system from prompt design to advanced prompt engineering using embeddings for RAG.

Chapter 12, Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4, goes through the revolutionary concepts of syntax-free, nonrepetitive stochastic models. We will use ChatGPT Plus with GPT-4 to run easy to complex Semantic Role Labeling (SRL) samples. We will see how a general-purpose, emergent model reacts to our SRL requests. We will progressively push the transformer model to the limits of SRL.

Chapter 13, Summarization with T5 and ChatGPT, goes through the concepts and architecture of the T5 transformer model. We will then apply T5 to summarize documents with Hugging Face models. The examples in this chapter will be legal and medical to explore domain-specific summarization beyond simple texts. We are not looking for an easy way to implement NLP but preparing ourselves for the reality of real-life projects. We will then compare T5 and ChatGPT approaches to summarization.

Chapter 14, Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2, examines Pathways to understand PaLM. We will continue and look at the main features of PaLM (Pathways Language Model), a decoder-only, densely activated, and autoregressive transformer model with 540 billion parameters trained on Google’s Pathways system. We will see how Google PaLM 2 can perform a chat task, a discriminative task (such as classification), a completion task (also known as a generative task), and more. We will implement the Vertex AI PaLM 2 API for several NLP tasks, including question-answering and summarization. Finally, we will go through Google Cloud’s fine-tuning process.

Chapter 15, Guarding the Giants: Mitigating Risks in Large Language Models, examines the risks of LLMs, risk management, and risk mitigation tools. The chapter explains hallucinations, memorization, risky emergent behavior, disinformation, influence operations, harmful content, adversarial attacks (“jailbreaks”), privacy, cybersecurity, overreliance, and memorization. We will then go through some risk mitigation tools through advanced prompt engineering, such as implementing a moderation model, a knowledge base, keyword parsing, prompt pilots, post-processing moderation, and embeddings.

Part III: Generative Computer Vision: A New Way to See the World

Chapter 16, Beyond Text: Vision Transformers in the Dawn of Revolutionary AI, explores the innovative transformer models that respect the basic structure of the Original Transformer but make some significant changes. We will discover powerful computer vision transformers like ViT, CLIP, DALL-E, and GPT-4V. We will implement vision transformers in code, including GPT-4V, and expand the text-image interactions of DALL-3 to divergent semantic association. We will take OpenAI models into the nascent world of highly divergent semantic association creativity.

Chapter 17, Transcending the Image-Text Boundary with Stable Diffusion, delves into to diffusion models, introducing Stable Vision, which has created a disruptive generative image AI wave rippling through the market. We will then dive into the principles, math, and code of the remarkable Keras Stable Diffusion model. We will go through each of the main components of a Stable Diffusion model and peek into the source code provided by Keras and run the model. We will run a text-to-video synthesis model with Hugging Face and a video-to-text task with Meta’s TimeSformer.

Chapter 18, Hugging Face AutoTrain: Training Vision Models without Coding, explores how to train a vision transformer using Hugging Face’s AutoTrain. We will go through the automated training process and discover the unpredictable problems that show why even automated ML requires human AI expertise. The goal of this chapter is also to show how to probe the limits of a computer vision model, no matter how sophisticated it is.

Chapter 19, On the Road to Functional AGI with HuggingGPT and its Peers, shows how we can use cross-platform chained models to solve difficult image classification problems. We will put HuggingGPT and Google Cloud Vision to work to identify easy, difficult, and very difficult images. We will go beyond classical pipelines and explore how to chain heterogeneous competing models.

Chapter 20, Beyond Human-Designed Prompts with Generative Ideation, explores generative ideation, an ecosystem that automates the production of an idea to text and image content. The development phase requires highly skilled human AI experts. For an end user, the ecosystem is a click-and-run experience. By the end of this chapter, we will be able to deliver ethical, exciting, generative ideation to companies with no marketing resources. We will be able to expand generative ideation to any field in an exciting, cutting-edge, yet ethical ecosystem.

Appendix A, Revolutionizing AI, The Power of Optimized Time Complexity in Transformer Models, gives you a detailed explanation of O(1) time complexity, explains what it is, how it works, and why it’s better than the O(n) alternative. This appendix also explores the token-to-token approach used by transformers.

Appendix B, Answers to the Questions, provides answers to all of the questions that you will find at the end of each chapter.