Images are complex structures with a large number of values (that is, H × W × D values with H indiacting the image's height, W its width, and D its depth/number of channels, such as D = 3 for RGB images). Even the small, single-channel images we used as examples in the first two chapters represent input vectors of size 28 × 28 × 1 = 784 values each. For the first layer of the basic neural network we implemented, this meant a weight matrix of shape (784, 64). This equates to 50,176 (784 × 64) parameter values to optimize, just for this variable!
This number of parameters simply explodes when we consider larger RGB images or deeper networks.