You're reading from Transformers for Natural Language Processing Build, train, and fine-tune deep neural network architectures for NLP with Python, Hugging Face, and OpenAI's GPT-3, ChatGPT, and GPT-4

Product type Paperback

Published in Mar 2022

Publisher Packt

ISBN-13 9781803247335

Length 602 pages

Edition 2nd Edition

Languages

Processing

Tools

Processing

Concepts

Mobile Application Development

Author (1):

Denis Rothman

View More author details

Table of Contents (25) Chapters

Preface

1. What are Transformers?

2. Getting Started with the Architecture of the Transformer Model FREE CHAPTER

3. Fine-Tuning BERT Models

4. Pretraining a RoBERTa Model from Scratch

5. Downstream NLP Tasks with Transformers

6. Machine Translation with the Transformer

7. The Rise of Suprahuman Transformers with GPT-3 Engines

8. Applying Transformers to Legal and Financial Documents for AI Text Summarization

9. Matching Tokenizers and Datasets

10. Semantic Role Labeling with BERT-Based Transformers

11. Let Your Data Do the Talking: Story, Questions, and Answers

12. Detecting Customer Emotions to Make Predictions

13. Analyzing Fake News with Transformers

14. Interpreting Black Box Transformer Models

15. From NLP to Task-Agnostic Transformer Models

16. The Emergence of Transformer-Driven Copilots

17. The Consolidation of Suprahuman Transformers with OpenAI’s ChatGPT and GPT-4

18. Other Books You May Enjoy

19. Index

Appendix I — Terminology of Transformer Models

1. Appendix II — Hardware Constraints for Transformer Models

2. Appendix III — Generic Text Completion with GPT-2

3. Appendix IV — Custom Text Completion with GPT-2

4. Appendix V — Answers to the Questions

Chapter 4, Pretraining a RoBERTa Model from Scratch

RoBERTa uses a byte-level byte-pair encoding tokenizer. (True/False)
True.

A trained Hugging Face tokenizer produces merges.txt and vocab.json. (True/False)
True.

RoBERTa does not use token-type IDs. (True/False)
True.

DistilBERT has 6 layers and 12 heads. (True/False)
True.

A transformer model with 80 million parameters is enormous. (True/False)
False. 80 million parameters is a small model.

We cannot train a tokenizer. (True/False)
False. A tokenizer can be trained.

A BERT-like model has six decoder layers. (True/False)
False. BERT contains six encoder layers, not decoder layers.

MLM predicts a word contained in a mask token in a sentence. (True/False)
True.

A BERT-like model has no self-attention sublayers. (True/False)
False. BERT has self...

The rest of the chapter is locked

You're reading from Transformers for Natural Language Processing Build, train, and fine-tune deep neural network architectures for NLP with Python, Hugging Face, and OpenAI's GPT-3, ChatGPT, and GPT-4

Table of Contents (25) Chapters

Chapter 4, Pretraining a RoBERTa Model from Scratch

Unlock this book and the full library FREE for 7 days

Authors (1)