In this section, we will briefly discuss the idea behind Bayesian learning from a mathematical perspective, which is the core of the probabilistic models for one-shot learning. The overall goal of Bayesian learning is to model the distribution of the parameters, , given the training data, that is, to learn the distribution,
.
In the probabilistic view of machine learning, we try to solve the following equation:
![](https://static.packt-cdn.com/products/9781838825461/graphics/assets/461dfab6-3cd8-4782-a6de-e092332fa3b6.png)
In this setting, we try to find the best set of parameters, theta (), that would explain the data. Consequently, we maximize the given equation over
:
![](https://static.packt-cdn.com/products/9781838825461/graphics/assets/59ce0f8b-fa97-4605-98f6-78bf3737c14f.png)
We can take the logarithm on both sides, which would not affect the optimization problem but makes the math easy and tractable:
![](https://static.packt-cdn.com/products/9781838825461/graphics/assets/f1da3c33-07dc-4b1d-8a75-aa4162729d8c.png)
We can drop the P(data) from the right side of the data as it is not dependent on θ for the optimization problem, and consequently, the optimization problem...