Introduction
Understanding how optimization works is fundamental for a successful career in machine learning. We picked the Gradient Descent (GD) method for an end-to-end deep dive to demonstrate the inner workings of an optimization technique. We will develop the concept using three recipes that walk the developer from scratch to a fully developed code to solve an actual problem with real-world data. The fourth recipe explores an alternative to GD using Spark and normal equations (limited scaling for big data problems) to solve a regression problem.
Let's get started. How does a machine learn anyway? Does it really learn from its mistakes? What does it mean when the machine finds a solution using optimization?
At a high level, machines learn based on one of the following five techniques:
- Error based learning: In this technique, we search the domain space for a combination of parameter values (weights) that minimize the total error (predicted versus actual) over the training data.
- Information...