Drawbacks of the TF-IDF model
Suppose, on an e-commerce website, a customer is searching for a jacket and intends to purchase a jacket with a unique design. The keyword entered is unique jacket
. What happens at the Solr end?
http://solr.server/solr/clothes/?q=unique+jacket
Now, unique
is a comparatively rare keyword. There would be fewer items or documents that mention unique in their description. Let us see how this affects the ranking of our results via the TF-IDF scoring algorithm. A relook at the scoring algorithm with respect to this query is shown in the following diagram:
The following parameters in the scoring formula do not affect the ranking of the documents in the query result:
coord(q,d)
: This would be constant for aMUST
query. Herein we are searching for bothunique
andjacket
, so all documents will have both the keywords and thecoord(q,d)
value will be the same for all documents.queryNorm(q)
: This is used to make the scores from different...