Regularization with BERT embeddings
Similar to how we used a pre-trained word2vec model to compute the embeddings, it is possible to use the embeddings of a pre-trained BERT model, a transformer-based model.
In this recipe, after quickly explaining the BERT model, we will train a model using BERT embeddings.
BERT stands for Bidirectional Encoder Representation from Transformers and is a model that was proposed by Google in 2018. It was first deployed in late 2019 in Google Search for English queries, as well as for many other languages. The BERT model has been proven effective in several NLP tasks, including text classification and question-answering.
Before quickly explaining what BERT is, let’s take a step back and look at what attention mechanisms and transformers are.
Attention mechanisms are widely used in NLP, and more and more in other fields such as computer vision, since their introduction in 2017. The high-level idea of an attention mechanism is to compute...