Transformers
The next approach we’ll try is pretrained language models, which is a de facto standard in modern NLP. Thanks to public model repositories, like the Hugging Face Hub, we don’t need to train them from scratch, which might be very costly. We can just plug the pretrained model into our architecture and fine-tune a small portion of our network to our dataset.
There is a wide variety of models — different sizes, datasets they were pretrained on, training techniques, etc. But all of them use a simple API, so plugging them into our code is simple and straightforward.
First, we need to install the libraries. For our task, we’ll use the package sentence-transformers==2.6.1, which you need to install manually. Once this is done, you can use it to compute embeddings of any sentences given as strings:
>>> from sentence_transformers import SentenceTransformer
>>> tr = SentenceTransformer("sentence-transformers...