FastText
Luckily for us, there is an extension of the word2vec
model that can approximate the unknown tokens – FastText. We can use it in a very similar way as we use word2vec
:
from gensim.models import FastText # create the instance of the model model = FastText(vector_size=4, window=3, min_count=1) # build a vocabulary model.build_vocab(corpus_iterable=tokenized_sentences) # and train the model model.train(corpus_iterable=tokenized_sentences, total_examples=len(tokenized_sentences), epochs=10)
In the preceding code fragment, the model is trained on the same set of data as word2vec
. model = FastText(vector_size=4, window=3, min_count...