Though not a contribution per se, Szegedy et al. made the following technique notorious by efficiently applying it to their network.
As previously mentioned in the Replacing fully connected layers with convolutions section, 1 × 1 convolutional layers (with a stride of 1) are often used to change the overall depth of input volumes without affecting their spatial structures. Such a layer with N filters would take an input of shape H × W × D and return an interpolated H × W × N tensor. For each pixel in the input image, its D channel values will be interpolated by the layer (according to its filter weights) into N channel values.
This property can be applied to reduce the number of parameters required for larger convolutions by compressing the features' depth beforehand (using N < D). This technique basically uses 1 × 1 convolutions as bottlenecks (that is, as intermediary...