A simple trick to train a network for super-resolution is to use a traditional upscaling method (such as bilinear interpolation) to scale the images to the target dimensions, before feeding them to the model. This way, the network can be trained as a denoising AE, whose task is to clear the upsampling artifacts and to recover lost details:
x_noisy = bilinear_upscale(bilinear_downscale(x_train)) # pseudo-code
fcn_8s.fit(x_noisy, x_train)
As mentioned earlier, the architectures we just covered are commonly applied to a wide range of tasks, such as depth estimation from color images, next-frame prediction (that is, predicting what the content of the next image could be, taking for input a series of video frames), and image segmentation. In the second part of this chapter, we will develop the latter task, which is essential in many real-life applications.