In this section, we will briefly discuss the idea behind Bayesian learning from a mathematical perspective, which is the core of the probabilistic models for one-shot learning. The overall goal of Bayesian learning is to model the distribution of the parameters, , given the training data, that is, to learn the distribution,
.
In the probabilistic view of machine learning, we try to solve the following equation:

In this setting, we try to find the best set of parameters, theta (), that would explain the data. Consequently, we maximize the given equation over
:

We can take the logarithm on both sides, which would not affect the optimization problem but makes the math easy and tractable:

We can drop the P(data) from the right side of the data as it is not dependent on θ for the optimization problem, and consequently, the optimization problem...