MLE and MAP Learning
In many statistical learning tasks, our goal is to find the optimal parameter set according to a maximization criterion. The most common approach is based on the likelihood
and is called MLE.
In fact, given a statistical model parametrized with the vector
, the likelihood can be interpreted as the probability of such a model generating the training data. Therefore, given a suitable structure of
the MLE provides a simple but extremely effective tool to define a generative model that is never biased by prior belief. For our purposes, let's suppose we have a data-generating process pdata, used to draw a dataset X:
data:image/s3,"s3://crabby-images/e8941/e8941ee224488ff7073d6a35b0e06d2a1f750b70" alt=""
In this case, the optimal set that maximizes the likelihood of a generic statistical model
parametrized with
is found as follows:
data:image/s3,"s3://crabby-images/31c9d/31c9deb4cce3c63806680630f050f49133770af1" alt=""
This approach has the advantage of being unbiased by incorrect preconditions, because the optimal value depends exclusively on the observed data. However, at the same time, this approach...