TF-IDF weighting
TF-IDF combines the approaches of term frequency (TF) and inverse document frequency (IDF) to generate a weight for each term in a document, and it is done using the following formula:
In other words, it assigns a weight to term t in document d as follows:
- If term t occurs many times in a few documents, it will be the highest
- If term t occurs a small number of times in a document, it will be lower
- If term t occurs in all documents, it will be the lowest
- If term t occurs in no documents, it will be 0