In this section, we'll define training a NN as the process of adjusting its parameters (weights) θ in a way that minimizes the cost function J(θ). The cost function is some performance measurement over a training set that consists of multiple samples, represented as vectors. Each vector has an associated label (supervised learning). Most commonly, the cost function measures the difference between the network output and the label.
We'll start this section with a short recap of the gradient descent optimization algorithm. If you're already familiar with it, you can skip this.