TF-IDF weighting
TF-IDF combines the approaches of term frequency (TF) and inverse document frequency (IDF) to generate a weight for each term in a document, and it is done using the following formula:
![](https://static.packt-cdn.com/products/9781788993494/graphics/b0eddd5d-b6ef-47af-b717-9f9edd5c7980.png)
In other words, it assigns a weight to term t in document d as follows:
- If term t occurs many times in a few documents, it will be the highest
- If term t occurs a small number of times in a document, it will be lower
- If term t occurs in all documents, it will be the lowest
- If term t occurs in no documents, it will be 0