Term Frequency-Inverse Document Frequency (TF-IDF) is a numerical statistic that captures how relevant a word is in a document, with respect to the entire collection of documents. What does this mean? Some words will appear a lot within a text document as well as across documents, for example, the English words the, a, and is. These words generally convey little information about the actual content of the document and don't make it stand out of the crowd. TF-IDF provides a way to weigh the importance of a word, by contemplating how many times it appears in a document, with respect to how often it appears across documents. Hence, commonly occurring words such as the, a, and is will have a low weight, and words more specific to a topic, such as leopard, will have a higher weight.
TF-IDF is the product...