An SGD implementation of gradient descent uses a simple distributed sampling of the data examples. Loss is a part of the optimization problem, and therefore, is a true sub-gradient.
data:image/s3,"s3://crabby-images/b73c0/b73c09a559d51ee802e384232db34648cc7a3437" alt=""
This requires access to the full dataset, which is not optimal.
data:image/s3,"s3://crabby-images/105af/105af70a619336b23a8ff1931bdbdc1938bfe509" alt=""
The parameter miniBatchFraction specifies the fraction of the full data to use. The average of the gradients over this subset
data:image/s3,"s3://crabby-images/2cb1a/2cb1a61ec9bcd7d7f186370bf4b25804eef0f326" alt=""
is a stochastic gradient. S is a sampled subset of size |S|= miniBatchFraction.
In the following code, we show how to use stochastic gardient descent on a mini batch to calculate the weights and the loss. The output of this program is a vector of weights and loss.
object SparkSGD {
def main(args: Array[String]): Unit = {
val m = 4
val n = 200000
val sc = new SparkContext("local[2]", "")
val points = sc.parallelize(0 until m,
2).mapPartitionsWithIndex { (idx, iter) =>
val random...