Using BERT and OpenAI embeddings instead of word embeddings
Instead of word embeddings, we can use Bidirectional Encoder Representations from Transformer (BERT) embeddings. A BERT model, like word embeddings, is a pretrained model and gives a vector representation, but it takes context into account and can represent a whole sentence instead of individual words.
Getting ready
For this recipe, we can use the Hugging Face sentence_transformers
package to represent sentences as vectors. We need PyTorch
, which is installed as part of the poetry
environment.
To get the vectors, we will use the all-MiniLM-L6-v2
model for this recipe.
We can also use the embeddings from OpenAI that come from their large language models (LLMs).
To use the OpenAI embeddings, you will need to create an account and get an API key from OpenAI. You can create an account at https://platform.openai.com/signup.
The notebook is located at https://github.com/PacktPublishing/Python-Natural-Language...