Topic Modeling
Within NLU, which is a part of NLP, one of the many tasks that can be performed is extracting the meaning of a sentence, a paragraph, or a whole document. One approach to understanding a document is through its topics. For example, if a set of documents is from a newspaper, the topics might be politics or sports. With topic modeling techniques, we can obtain a bunch of words representing various topics. Depending on your set of documents, you will then have different topics represented by different words. The goal of these techniques is to know the different types of documents in your corpus.
Term Frequency – Inverse Document Frequency (TF-IDF)
TF-IDF is a commonly used NLP model for extracting the most important words from a document. To perform this classification, the algorithm will assign a weight to each word. The idea of this method is to ignore words without relevance to the meaning of a global concept, (which means the overall topic of a text), so those terms will be...