Let's formally define the problem statement for the backward algorithm. In this case, we are trying to compute the probability of observation variables given the current state:
Backward algorithm: P(Xk+1:n|Zk)
Similar to what we did in the case of the forward algorithm, we will again try to convert this probability term into a recursive equation in terms of known distributions, so that we can recursively compute the probabilities at different time instances. First, we introduce a new term, Zk+1 in P(Xk+1:n|Zk), using the marginalization rule:
Here, we are marginalizing over the Zk+1 variable by summing over all of its possible states, which we have assumed to be m. Now, we can use the product rule of probability to split the preceding equation as:
Now, from the d-separation property of our model, we...