In this section, we're going to build a cat-and-dog recognizer Java application using the VGG-16 architecture and transfer learning. Let's revisit the VGG-16 architecture (explained previously in the Working with classical networks section).
The VGG-16 architecture is quite uniform; we have only one 3 x 3 same convolution, which leaves the first 2 dimensions untouched and increases the number of channels in the third dimension, and also increases the max pooling 2 x 2 stride two, which, in turn, decreases the first 2 dimensions by dividing it by 2, thereby leaving the third dimension untouched. The idea with many convolution architectures is eventually to shrink these two-dimensions and increase the number of channels; if we look at the output of these convolution layers...