MLE and MAP Learning
In many statistical learning tasks, our goal is to find the optimal parameter set according to a maximization criterion. The most common approach is based on the likelihood
and is called MLE.
In fact, given a statistical model parametrized with the vector
, the likelihood can be interpreted as the probability of such a model generating the training data. Therefore, given a suitable structure of
the MLE provides a simple but extremely effective tool to define a generative model that is never biased by prior belief. For our purposes, let's suppose we have a data-generating process pdata, used to draw a dataset X:

In this case, the optimal set that maximizes the likelihood of a generic statistical model
parametrized with
is found as follows:

This approach has the advantage of being unbiased by incorrect preconditions, because the optimal value depends exclusively on the observed data. However, at the same time, this approach...