We started off the chapter by understanding how Sentence-BERT works. We learned that in Sentence-BERT, we use mean or max pooling for computing the sentence representation. We also learned that Sentence-BERT is basically a pre-trained BERT model that is fine-tuned for computing sentence representation. For fine-tuning the pre-trained BERT model, Sentence-BERT uses a Siamese and triplet network architecture, which makes the fine-tuning faster and helps in obtaining accurate sentence embeddings.
Later, we learned how to use the sentence-transformers library. We learned how to compute sentence representation and also how to compute the semantic similarity between a sentence pair using sentence-transformers. Following this, we learned how to make monolingual embeddings multilingual using knowledge distillation. We learned how to make the student (XLM-R) generate multilingual embeddings the same as how the teacher (Sentence-BERT) generates the monolingual embedding.
Next, we explored...