We've all understood the basics of Bayes' rule, as explained in Chapter 6, Predicting Stock Prices using Gaussian Process Regression.
For Bayesian machine learning, we use the same formula as Bayes' rule to learn model parameters () from the given data, . The formula, then, looks like this:
Here, or the probability of observed data is also called evidence. This is always difficult to compute. One brute-force way is to integrate out  for all the values of model parameters, but this is obviously too expensive to evaluate. is the prior on parameters, which is nothing but some randomly initialized value of parameters in most cases. Generally, we don't care about setting the priors perfectly as we expect the inference procedure to converge to the right value of parameters.
is known as the likelihood of...