Predicting new documents
To classify a new document into one of the topics, we just need to use .transform()
. As discussed earlier, the difference between .fit_transform()
and .transform()
is that .fit_transform()
trains and predicts documents, whereas .transform()
predicts documents. Let’s load the test dataset:
test = pd.read_csv(path + “/ag_news_test.csv”)test_docs = test['Description']
We will load the model that has been trained by using the .
load()
function:
from bertopic import BERTopicmodel = BERTopic.load(path + “/ag_news_bertopic”)
It is just that simple:
predicted_topics, predicted_probs = model.transform(test_docs) print(predicted_topics)
The output is as follows:
[1, 6, 3, -1, 3, 15, -1, -1, 100, 100, 68, 12, 12, 3, 3, 3, 6, 6, 3, 10, 25, -1,…]
BERTopic adopts a modular design approach that consists of five modules. You can choose techniques other than the default techniques...