Let's take some well-known CNN, say VGG16, and see in detail how exactly the memory is being spent. You can print the summary of it using Keras:
from keras.applications import VGG16
model = VGG16()
print(model.summary())
The network consists of 13 2D-convolutional layers (with 3×3 filters, stride 1 and pad 1) and 3 fully connected layers ("Dense"). Plus, there are an input layer, 5 max-pooling layers and a flatten layer, which do not hold parameters.
Layer |
Output shape |
Data memory |
Parameters |
Number of parameters |
InputLayer |
224×224×3 |
150528 |
0 |
0 |
Conv2D |
224×224×64 |
3211264 |
3×3×3×64+64 |
1792 |
Conv2D |
224×224×64 |
3211264 |
3×3×64×64+64 |
36928 |
MaxPool2D |
112×112×64 |
802816 |
0 |
0 |
Conv2D... |