The first breakthrough in the architecture of CNN came in the year 2012. This award-winning CNN architecture is called AlexNet. It was developed at the University of Toronto by Alex Krizhevsky and his professor, Jeffry Hinton.Â
In the first run, a ReLU activation function and a dropout of 0.5 were used in this network to fight overfitting. As we can see in the following image, there is a normalization layer used in the architecture, but this is not used in practice anymore as it used heavy data augmentation. AlexNet is still used today even though there are more accurate networks available, because of its relative simple structure and small depth. It is widely used in computer vision:
AlexNet is trained on the ImageNet database using two separate GPUs, possibly due to processing limitations with inter-GPU connections at the time, as shown in the...