In this chapter, we saw how we could improve the simple linear network developed in Chapter 3, Computational Graphs and Linear Models. We can add linear layers, increase the width of the network, increase the number of epochs we run the model, and tweak the learning rate. However, linear networks will not be able to capture the nonlinear features of datasets, and at some point their performance will plateau. Convolutional layers, on the other hand, use a kernel to learn nonlinear features. We saw that with two convolutional layers, performance on MNIST improved significantly.
In the next chapter, we'll look at some different network architectures, including recurrent networks and long short-term networks.