Regularization – Ridge and Lasso
Regularization is an important concept in ML; it’s used to counteract overfitting. In the world of big data, it’s easy to overfit data to the training set. When this happens, the model will often perform badly on the test set, as indicated by mean_squared_error
or some other error.
You may wonder why a test set is kept aside at all. Wouldn’t the most accurate ML model come from fitting the algorithm on all the data?
The answer, generally accepted by the ML community after research and experimentation, is no.
There are two main problems with fitting an ML model on all the data:
- There is no way to test the model on unseen data. ML models are powerful when they make good predictions on new data. Models are trained on known results, but they perform in the real world on data that has never been seen before. It’s not vital to see how well a model fits known results (the training set), but it’s absolutely...