Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Transformers

You're reading from   Mastering Transformers The Journey from BERT to Large Language Models and Stable Diffusion

Arrow left icon
Product type Paperback
Published in Jun 2024
Publisher Packt
ISBN-13 9781837633784
Length 462 pages
Edition 2nd Edition
Languages
Tools
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Savaş Yıldırım Savaş Yıldırım
Author Profile Icon Savaş Yıldırım
Savaş Yıldırım
Meysam Asgari- Chenaghlu Meysam Asgari- Chenaghlu
Author Profile Icon Meysam Asgari- Chenaghlu
Meysam Asgari- Chenaghlu
Arrow right icon
View More author details
Toc

Table of Contents (25) Chapters Close

Preface 1. Part 1: Recent Developments in the Field, Installations, and Hello World Applications
2. Chapter 1: From Bag-of-Words to the Transformers FREE CHAPTER 3. Chapter 2: A Hands-On Introduction to the Subject 4. Part 2: Transformer Models: From Autoencoders to Autoregressive Models
5. Chapter 3: Autoencoding Language Models 6. Chapter 4: From Generative Models to Large Language Models 7. Chapter 5: Fine-Tuning Language Models for Text Classification 8. Chapter 6: Fine-Tuning Language Models for Token Classification 9. Chapter 7: Text Representation 10. Chapter 8: Boosting Model Performance 11. Chapter 9: Parameter Efficient Fine-Tuning 12. Part 3: Advanced Topics
13. Chapter 10: Large Language Models 14. Chapter 11: Explainable AI (XAI) in NLP 15. Chapter 12: Working with Efficient Transformers 16. Chapter 13: Cross-Lingual and Multilingual Language Modeling 17. Chapter 14: Serving Transformer Models 18. Chapter 15: Model Tracking and Monitoring 19. Part 4: Transformers beyond NLP
20. Chapter 16: Vision Transformers 21. Chapter 17: Multimodal Generative Transformers 22. Chapter 18: Revisiting Transformers Architecture for Time Series 23. Index 24. Other Books You May Enjoy

What this book covers

Chapter 1, From Bag-of-Words to Transformers, will present a gentle and easy-to-understand introduction to Transformers. Additionally, it will compare older methods, including non-deep learning and deep learning models, such as CNNs, RNNs, and LSTMs for text, with Transformers.

Chapter 2, A Hands-On Introduction to the Subject, provides the first hands-on experiment for setting up your environment. In this chapter, you will learn about all aspects of Transformers at an introductory level. Installing tensorflow, pytorch, conda, transformers, and sentenceTransformers libraries will be explained. Another section of this chapter will be dedicated to installing GPU versions of related libraries. We will provide some hands-on code examples for you to start learning through examples. You will write your first hello-world program with Transformers by loading community-provided pre-trained language models.

Chapter 3, Autoencoding Language Models, covers how to use the encoder part of Transformer. You will learn about the architecture of the BERT model and how to train/test autoencoding language models such as BERT and ALBERT. You will understand the difference between the models and the objective functions of the architectures. You will also learn how to share the models with the community and also how to fine-tune other pre-trained language models shared by the community.

Chapter 4, From Generative Models to Large Language Models, covers a generative solution with decoder-only Transformers and encoder-decoder models. In this chapter, you will see the details of Generative Language Models (GLMs) and Large Language Models (LLMs). You will learn how to pre-train any language model, such as Generated Pre-trained Transformer 2 (GPT-2), on your own text and use it in various tasks, such as Natural Language Generation (NLG). You will understand the basics of a Text-to-Text Transfer Transformer (T5) model, how to conduct a hands-on multitask learning experiment with T5, and how to train a Multilingual T5 (mT5) model on your own Machine Translation (MT) data. After finishing this chapter, you will have an understanding of GLMs and their various use cases in text2text applications, such as summarization, paraphrasing, multitask learning, zero-shot learning, and MT.

Chapter 5, Fine-Tuning Language Model for Text Classification, teaches you how to load a pre-trained model for text classification and fine-tune a base model for any downstream text classification task, such as sentiment analysis. You will work with well-known datasets, such as GLUE, as well as your own datasets. You will also take advantage of the Trainer class of the transformers library, which handles much of the complexity of the training and fine-tuning processes.

Chapter 6, Fine-Tuning Language Models for Token Classification, covers how to solve token-wise classification tasks such as named-entity recognition, part-of-speech tasks, and span prediction by fine-tuning Transformer models. Token classification, one of the challenging topics in NLP, will be implemented by fine-tuning (or creating from scratch) using the Trainer class of the Python transformers library. NER, POS, and QA could be counted as token classification problems. GLUE and community-provided datasets are used for quick prototyping. We will use shared models and datasets to fine-tune the models for token classification problems. We will also discuss how to benchmark token classification for evaluation.

Chapter 7, Text Representation, will look at how to represent text with a Transformer-based architecture for a variety of NLP tasks. Text representation is another trivial task in modern NLP. Representing sentences using various models such as the universal sentence encoder and siamese BERT (sentence BERT) with additional libraries such as sentenceTransformers will be explained here. Datasets such as MNLI and XNLI will be described, as well as benchmarking sentence representation models with STSBenchmark will be explored. Few-/zero-shot learning using efficient sentence representation will also be covered. Unsupervised tasks such as clustering using transfer learning will be presented.

Chapter 8, Boosting Model Performance, discusses many techniques for increasing vanilla fine-tuning performance. In this chapter, we’ll explore how to boost the Transformers model beyond vanilla training. Data augmentation, domain adaptation, and hyperparameter optimization are the main topics for improving model performance.

Chapter 9, Parameter Efficient Fine-Tuning, instead of fine-tuning all parameters of an LLM, we can find a cheaper and more efficient way to fine-tune with LoRa-like advanced techniques. Although fine-tuning pre-trained Transformer models is a very useful way to solve NLP tasks, vanilla fine-tuning can be parameter inefficient in many ways. We can use an adapter such as LoRa to reduce the number of parameters to be fine-tuned.

Chapter 10, Large Language Models, LLMs such as T5 and LLaMA are discussed in this chapter, with fine-tuning and efficient inference as the main focus. The field of LLMs has made significant progress in recent years with the development of models such as GPT-3 (175B), PaLM (540B), BLOOM (175B), LlaMA (65B), Falcon (180B), and Mistral (7B). These models have shown impressive abilities in various natural language tasks. It is challenging to cover such an important topic in a single chapter; however, we will have already covered many aspects of the topic, particularly in Chapter 4. Moreover, throughout the book, we will discuss the paradigm of neural language models and their training process.

Chapter 11, Explainable AI (XAI) for NLP, we discuss potential approaches with hands-on experiments. It is very difficult for us to make sense of the decisions made by deep learning models during inference. So, we need different ways to make sense of the decisions made by these black-box models. In this chapter, we will see how to apply Explainable AI (XAI) approaches to NLP.

Chapter 12, Working with Efficient Transformers, as it is getting difficult to run large models under limited computational capacity, it is important now to pre-train a smaller general-purpose language model such as DistilBERT, which can then be fine-tuned with good performances on a wide variety of problems like its non-distilled counterparts. Also, Transformer-based architectures face complexity bottlenecks due to the quadratic complexity of the attention dot product in transformers and long-context NLP tasks. Character-based language models, speech processing, and long documents are among the long-context problems. In recent years, we have seen much progress in making self-attention more efficient, such as Reformer, Performer, and Bird as solutions to its complexity.

Chapter 13, Cross-Lingual and Multilingual Language Modeling, in this chapter, we discuss how to work with more than one language. Architecture and models such as XLM will be described in this chapter. Moving from uni-lingual to multilingual and cross-lingual language modeling will be explained. The concept of knowledge sharing between languages will be presented, as well as the impact of byte-pair encoding on tokenization in order to achieve better input. Cross-lingual sentence similarity using a cross-lingual corpus (XNLI) will be detailed. Tasks such as cross-lingual classification and utilization of cross-lingual sentence representation for training on one language and testing on another one will be presented with concrete examples of real-life problems in NLP, such as multilingual intent classification.

Chapter 14, Serving Transformer Models, how to go to production is the key topic of this chapter. Like any other real-life and modern solution, NLP-based solutions must be able to be served in a production environment. However, metrics such as response time should be in mind while developing such solutions. This chapter will detail how to serve a Transformer-based NLP solution in environments where CPU/GPU is available. TensorFlow Extended (TFX) for machine learning deployment as a solution will be described here. Also, other solutions for serving Transformers as APIs, such as FastAPI, will be illustrated.

Chapter 15, Model Tracking and Monitoring, in this chapter, we need to monitor our model during training to know where to stop and check whether there is a problem such as overfitting. We will track our experiments by logging and monitoring using TensorBoard. It allows us to efficiently host, track, and share our experiment results, such as tracking loss and accuracy metrics or projecting the learned embeddings to a lower dimensional space.

Chapter 16, Vision Transformers, other than NLP, we can also work with images. Transformers achieved great goals in the field of NLP, and as you will have seen in previous chapters, they can accomplish many different tasks. In this chapter, Vision Transformer (ViT) is explained. Just like NLP, different models have been created for the vision as well, and each single one of them created a new view of the computer vision perspective. By reading this chapter, you will know how you can use models such as ViT for computer vision tasks, how pretrained computer vision models based on Transformers work, and how you can fine-tune them for specific tasks.

Chapter 17, Multimodal Generative Transformers, we can also work with all modalities in one model; Text, Image, Audio, etc. Models such as CLIP show promising results in multimodal search (text-image). Other models and approaches, such as Stable Diffusion, perform very well in generating images from textual prompts. In this chapter, we will look into these models and first create a semantic search engine using CLIP. Next, you will learn how it is possible to create multimodal classifiers using CLIP and cross-modal pretrained models. You will learn how to use Stable Diffusion and other related pretrained prompt-based models, such as Midjourney. After that, you will learn how to create object detection solutions with no training data using zero-shot prompt-based multimodal pretrained models.

Chapter 18, Revisiting Transformers Architecture for Time Series, covers how to restructure a transformer for time series tasks to leverage the benefits of Transformers. Transformers are well known for NLP-specific tasks. Their main power comes from their capability to model time series data. This data can be textual or non-textual. In this chapter, you will learn how to use transformers for time series data modeling and predicting.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime