Hierarchical Dirichlet Process (HDP)
HDP is a non-parametric variant of LDA. It is called "non-parametric" since the number of topics is inferred from the data, and this parameter isn't provided by us. This means that this parameter is learned and can increase (that is, it is theoretically unbounded).
The tomotopy HDP implementation can infer between 1 and 32,767 topics. gensim's HDP implementation seems to fix the number of topics at 150 topics. For our purposes, we will be using the tomotopy HDP implementation.
The gensim and the scikit-learn libraries use variational inference, while the tomotopy library uses collapsed Gibbs sampling. When the time required by collapsed Gibbs sampling is not an issue, then it is preferable to use collapsed Gibbs sampling over variational inference. In other cases, we may prefer to use variational inference. For the tomotopy library, the following parameters are used:
iter
: This refers to the number of iterations that...