In this chapter, we learned how to leverage U-Net and Mask R-CNN to perform segmentation on top of images. We understood how the U-Net architecture can perform downscaling and upscaling on images using convolutions to retain the structure of the image, while still being able to predict masks around objects within an image. We then cemented our understanding of this using the road scene detection exercise, where we segmented the image into multiple classes. Next, we learned about RoI Align, which helps ensure that the issues with RoI pooling surrounding image quantization are addressed. After that, we learned about how Mask R-CNN works so that we could train models to predict instances of people in images, as well as instances of people and tables in an image.
Now that we have a good understanding of various object detection techniques and image segmentation techniques, in the next chapter, we will learn about applications that leverage the techniques we have learned about so far...