We just learned how gradient descent works and how to code the gradient descent algorithm from scratch for a simple two-layer network. But implementing gradient descent for complex neural networks is not a simple task. Apart from implementing, debugging a gradient descent for complex neural network architecture is again a tedious task. Surprisingly, even with some buggy gradient descent implementations, the network will learn something. However, apparently, it will not perform well compared to the bug-free implementation of gradient descent.
If the model does not give us any errors and learns something even with buggy implementations of the gradient descent algorithm, how can we evaluate and ensure that our implementation is correct? That is why we use the gradient checking algorithm. It will help us to validate our implementation...