We will now study directed graphical models briefly before we delve into probabilistic models for one-shot learning. A directed graphical model (also known as a Bayesian network) is defined with random variables connected with directed edges, as in the parent-child relationship. One such Bayesian network is shown in the following diagram:
The joint distribution over random variables in this graph S, R, L, W, and T can be broken into multiple distributions by a simple chain rule:
The conditional distributions on the right side of the preceding equation have a large number of parameters. This is because each distribution is conditioned on many variables and each conditioned variable has its own outcome space. This effect is even more prominent if we go further down in the graph when we have a huge set of conditioned variables. Consequently...