DeepLab is the semantic segmentation state-of-the-art model. In 2016, it was developed and open sourced by Google. Multiple versions have been released and many improvements have been made to the model since then. These include DeepLab V2, DeepLab V3, and DeepLab V3+.
Before the release of DeepLab V3+, we were able to encode multi-scale contextual information using filters and pooling operations at different rates; the newer networks could capture the objects with sharper boundaries by recovering spatial information. DeepLabv3+ combines these two approaches. It uses both the encoder-decoder and the spatial pyramid pooling modules.
The following diagram shows the architecture of DeepLabv3+, which consists of encoder and decoder modules:
Let's look at the encoder and decoder modules in more detail:
- Encoder: In the encoder step, essential information from the input image is extracted using a pre-trained convolutional neural...