Introduction to sentence embeddings
Pretrained BERT models do not produce efficient and independent sentence embeddings as they always need to be fine-tuned in an end-to-end supervised setting. This is because we can think of a pretrained BERT model as an indivisible whole and semantics is spread across all layers, not just the final layer. Without fine-tuning, it may be ineffective to use its internal representations independently. It is also hard to handle unsupervised tasks such as clustering, topic modeling, information retrieval, or semantic search. Because we have to evaluate many sentence pairs during clustering tasks, for instance, this causes massive computational overhead.
Luckily, many modifications have been made to the original BERT model, such as SBERT, to derive semantically meaningful and independent sentence embeddings. SBERT employs a particular objective function in comparison to BERT. In this approach, sentences can be paired via a twin network and evaluated...