Understanding overfitting
General overfitting occurs when a very complex statistical model suits the observed data because it has too many parameters compared to the number of observations. The risk is that an incorrect model can perfectly fit data, just because it is quite complex compared to the amount of data available. Although, it is possible for overfitting to occur when the amount of data is adequate. Consequently, when the model is used to predict new observations, there is a problem, because it is not able to generalize.
The concept of overfitting is also very important in regression analysis. Usually, a learning algorithm is trained using a set of examples (training set), the output of which is already known. It is assumed that the learning algorithm will reach a state in which it will be able to predict outputs for all the other examples it has not yet seen, assuming that the learning model will be able to generalize.
However, especially in cases where there is a small number of...