You're reading from Transformers for Natural Language Processing Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more

Product type Paperback

Published in Jan 2021

Publisher Packt

ISBN-13 9781800565791

Length 384 pages

Edition 1st Edition

Languages

Processing

Tools

BERT

Concepts

Mobile Application Development

Author (1):

Denis Rothman

View More author details

Table of Contents (16) Chapters

Preface

1. Getting Started with the Model Architecture of the Transformer

2. Fine-Tuning BERT Models FREE CHAPTER

3. Pretraining a RoBERTa Model from Scratch

4. Downstream NLP Tasks with Transformers

5. Machine Translation with the Transformer

6. Text Generation with OpenAI GPT-2 and GPT-3 Models

7. Applying Transformers to Legal and Financial Documents for AI Text Summarization

8. Matching Tokenizers and Datasets

9. Semantic Role Labeling with BERT-Based Transformers

10. Let Your Data Do the Talking: Story, Questions, and Answers

11. Detecting Customer Emotions to Make Predictions

12. Analyzing Fake News with Transformers

13. Other Books You May Enjoy

14. Index

Appendix: Answers to the Questions

Preprocessing a WMT dataset

Vaswani et al. (2017) present the Transformer's achievements on the WMT 2014 English-to-German translation task and the WMT 2014 English-to-French translation task. The Transformer achieves a state-of-the-art BLEU score. BLEU will be described in the Evaluating machine translation with BLEU section of this chapter.

The 2014 Workshop on Machine Translation (WMT) contained several European language datasets. One of the datasets contained data taken from version 7 of the Europarl corpus. We will be using the French-English dataset from the European Parliament Proceedings Parallel Corpus 1996-2011. The link is https://www.statmt.org/europarl/v7/fr-en.tgz.

Once you have downloaded the files and have extracted them, we will preprocess the two parallel files:

europarl-v7.fr-en.en
europarl-v7.fr-en.fr

We will load, clear, and reduce the size of the corpus.

Let's start the preprocessing.

Preprocessing the raw data...

The rest of the chapter is locked

You're reading from Transformers for Natural Language Processing Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more

Table of Contents (16) Chapters

Preprocessing a WMT dataset

Preprocessing the raw data...

Authors (1)

Other recommended products

You're reading from Transformers for Natural Language Processing Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more

Table of Contents (16) Chapters

Preprocessing a WMT dataset

Preprocessing the raw data...

Unlock this book and the full library FREE for 7 days

Authors (1)

Other recommended products