4. Encoder network for unsupervised clustering
The encoder network implementation for unsupervised clustering is shown in Figure 13.4.1. It is an encoder with a VGG-like [2] backbone and a Dense
layer with a softmax output. The simplest VGG-11 has a backbone, as shown in Figure 13.4.2.
For MNIST, using the simplest VGG-11 backbone decimates the feature map size to zero from 5 times the MaxPooling2D
operations. Therefore, a scaled-down version of the VGG-11 backbone is used, as shown in Figure 13.4.3, when implemented in Keras. The same set of filters is used.
Figure 13.4.1 Network implementation of IIC encoder network . The input MNIST image is center cropped to 24 x 24 pixels. In this example, is a random 24 x 24-pixel cropping operation.
Figure 13.4.2 VGG-11 classifier backbone
In Figure 13.4.3, there are 4 Conv2D-BN-ReLU Activation-MaxPooling2D
layers with filter sizes (64,128,256,512). The last Conv2D
layer does not use MaxPooling2D
. Therefore...