BERT-based transfer learning
Embeddings like GloVe are context-free embeddings. Lack of context can be limiting in NLP contexts. As discussed before, the word bank can mean different things depending on the context. Bi-directional Encoder Representations from Transformers, or BERT, came out of Google Research in May 2019 and demonstrated significant improvements on baselines. The BERT model builds on several innovations that came before it. The BERT paper also introduces several innovations of ERT works.
Two foundational advancements that enabled BERT are the encoder-decoder network architecture and the Attention mechanism. The Attention mechanism further evolved to produce the Transformer architecture. The Transformer architecture is the fundamental building block of BERT. These concepts are covered next and detailed further in later chapters. After these two sections, we will discuss specific innovations and structures of the BERT model.