So far, we have visited the theories behind three specific ways that allow us to improve our model's generalizability on unseen data. Primarily, we can vary our network size to ensure it has no extra learning capacity. We can also penalize inefficient representations by initializing weighted parameters. Finally, we can add dropout layers to prevent our network from getting lazy. As we noted previously, seeing is believing.
Now, let's implement our understanding using the MNIST dataset and some Keras code. As we saw previously, to change the network size, you are simply required to change the number of neurons per layer. This can be done in Keras during the process of adding layers, like so:
import keras.regularizers
model=Sequential()
model.add(Flatten(input_shape=(28, 28)))
model.add(Dense(1024, kernel_regularizer=
...