All Inception blocks so far start by splitting the input into several parallel paths. Each path continues with a dimensionality-reduction 1×1 cross-channel convolution, followed by regular cross-channel convolutions. On one hand, the 1×1 connection maps cross-channel correlations, but not spatial ones (because of the 1×1 filter size). On the other hand, the subsequent cross-channel convolutions map both types of correlations. Let's recall that in Chapter 2, Understanding Convolutional Networks, we introduced depthwise separable convolutions (DSC), which combine the following two operations:
- A depthwise convolution: In a depthwise convolution, a single input slice produces a single output slice, therefore it only maps spatial (and not cross-channel) correlations.
- A 1×1 cross-channel convolution: With 1×1 convolutions, we have...