Regarding the more generic encoders-decoders, their applications are numerous. They are used to convert images, to map them from one domain or modality to another. For example, such models are often applied to depth regression, that is, the estimation of the distance between the camera and the image content (the depth) for each pixel. This is an important operation for augmented-reality applications, for example, since it allows them to build a 3D representation of the surroundings, and thus to better interact with the environment.
Similarly, encoders-decoders are commonly used for semantic segmentation (refer to Chapter 1, Computer Vision and Neural Networks, for its definition). In this case, the networks are trained not to return the depth, but the estimated class for each pixel (refer to Figure 6-2-c). This important application will be detailed in the second part of this chapter. Finally, encoders-decoders are also famous for their more artistic use cases, such as transforming...