Improving sentiment analysis with BERT and Transformers
BERT (https://arxiv.org/abs/1810.04805v2) is a model based on the Transformer architecture. It has achieved significant success in various language understanding tasks in recent years.
As its name implies, bidirectional is one significant difference between BERT and earlier Transformer models. Traditional models often process sequence in a unidirectional manner, but BERT processes the entire context bidirectionally. This bidirectional context understanding makes the model more effective in capturing nuanced relationships in a sequence.
BERT is basically a stack of trained Transformer’s encoders. It is pre-trained on large amounts of unlabeled text data in a self-supervised manner. During pre-training, it focuses on understanding the meaning of text in context. After pre-training, BERT can be fine-tuned for specific downstream tasks.
Let’s first talk about the pre-training works.