AlexNet was a game changer, being the first CNN successfully trained for such a complex recognition task and making several contributions that are still valid nowadays, such as the following:
- The use of a rectified linear unit (ReLU) as an activation function, which prevents the vanishing gradient problem (explained later in this chapter), and thus improving training (compared to using sigmoid or tanh)
- The application of dropout to CNNs (with all the benefits covered in Chapter 3, Modern Neural Networks)
- The typical CNN architecture combining blocks of convolution and pooling layers, with dense layers afterward for the final prediction
- The application of random transformations (image translation, horizontal flipping, and more) to synthetically augment the dataset (that is, augmenting the number of different training images by randomly editing the original samples—see Chapter 7, Training on Complex and Scarce Datasets, for more details)
Still, even back then...