Coding Gradient Descent optimization to solve Linear Regression from scratch
In this recipe, we will explore how to code Descent to solve a Linear Regression problem. In the previous recipe, we demonstrated how to code GD to find the minimum of a quadratic function.
This recipe demonstrates a more realistic optimization problem in which we optimize (minimize) the least square cost function to solve the linear regression problem in Scala on Apache Spark 2.0+. We will use real data and run our algorithm and compare the result to a tier-1 commercially available statistic software to demonstrate accuracy and speed.
How to do it...
We start by downloading the file from Princeton University which contains the following data:
Source: Princeton University
- Download source: http://data.princeton.edu/wws509/datasets/#salary.
- To keep things simple, we then select the
yr
andsl
to study how the number of years in rank influences the salary. To cut down on data wrangling code, we save those two columns in...