Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Google open sources BERT, an NLP pre-training technique

Save for later
  • 2 min read
  • 05 Nov 2018

article-image

Google open-sourced Bidirectional Encoder Representations from Transformers (BERT) last Friday for NLP pre-training. Natural language processing (NLP) consists of topics like sentiment analysis, language translation, question answering, and other language-related tasks. Large datasets for NLP containing millions, or billions, of annotated training examples is scarce.

Google says that with BERT, you can train your own state-of-the-art question answering system in 30 minutes on a single Cloud TPU, or a few hours using a single GPU. The source code built on top of TensorFlow. A number of pre-trained language representation models are also included.

BERT features


BERT improves on recent work in pre-training contextual representations. This includes semi-supervised sequence learning, generative pre-training, ELMo, and ULMFit. BERT is different from these models, it is the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus - Wikipedia.

Context-free models like word2vec generate a single word embedding representation for every word. Contextual models, on the other hand, generate a representation\ of each word based on the other words in the sentence. BERT is deeply bidirectional as it considers the previous and next words.

Bidirectionality


It is not possible to train bidirectional models by simply conditioning each word on words before and after it. Doing this would allow the word that’s being predicted to indirectly see itself in a multi-layer model. To solve this, Google researchers used a straightforward technique of masking out some words in the input and condition each word bidirectionally in order to predict the masked words. This idea is not new, but BERT is the first technique where it was successfully used to pre-train a deep neural network.

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime

Results


On The Stanford Question Answering Dataset (SQuAD) v1.1, BERT achieved 93.2% F1 score surpassing the previous state-of-the-art score of 91.6% and human-level score of 91.2%. BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks.

For more details, visit the Google Blog.

Intel AI Lab introduces NLP Architect Library

FAT Conference 2018 Session 3: Fairness in Computer Vision and NLP

Implement Named Entity Recognition (NER) using OpenNLP and Java