MLE and MAP Learning
In many statistical learning tasks, our goal is to find the optimal parameter set according to a maximization criterion. The most common approach is based on the likelihood and is called MLE.
In fact, given a statistical model parametrized with the vector , the likelihood can be interpreted as the probability of such a model generating the training data. Therefore, given a suitable structure of the MLE provides a simple but extremely effective tool to define a generative model that is never biased by prior belief. For our purposes, let's suppose we have a data-generating process pdata, used to draw a dataset X:
In this case, the optimal set that maximizes the likelihood of a generic statistical model parametrized with is found as follows:
This approach has the advantage of being unbiased by incorrect preconditions, because the optimal value depends exclusively on the observed data. However, at the same time, this approach...