Coding – TF-IDF
TF-IDF is a score between 0 and 1. A high TF-IDF value indicates that a term is meaningful and a small value indicates that a term is a common one. Let’s start with Gensim to see how it codes for TF-IDF.
Gensim for TF-IDF
The Gensim module for TF-IDF is TfidfModel
. TF-IDF is built on BoW. Because we already built BoW In the Coding – BoW section, all we need to do now is input the BoW results to the TF-IDF function. I will reprint the Coding – BoW section so you do not need to search back and forth. There are five strings in doc_list
:
doc_list = ["Start spreading the news", "You're leaving today (tell him friend)", "I want to be a part of it, New York, New York", "Your vagabond shoes, they are longing to stray", "And steps around the heart of it, New York, New York" ]
The first number in the 2-tuple is the word index, and the second number is the word count:
BoW_corpus...