As we saw in the previous section, in the case of Bayesian learning we assume all the variables as a random variable, assign a prior to it, and then try to compute the posterior based on that. Therefore, in the case of HMM, we can assign a prior on our transition probabilities, emission probabilities, or the number of observation states.
Therefore, the first problem that we need to solve is to select the prior. Theoretically, a prior can be any distribution over the parameters of the model, but in practice, we usually try to use a conjugate prior to the likelihood, so that we have a closed-form solution to the equation. For example, in the case when the output of the HMM is discrete, a common choice of prior is the Dirichlet distribution. It is mainly for two reasons, the first of which is that the Dirichlet distribution is a conjugate distribution to...