Exploring zero-shot learning
Zero-shot learning is a paradigm that involves utilizing a trained machine learning model to tackle new tasks without training and learning directly on the new task. The method implements transfer learning at its core but instead of requiring additional learning in the downstream task, no learning is done. The method that we will be using to realize zero-shot learning here is CLIP as a base and thus is an extension of an unsupervised learning method.
CLIP can be used to perform zero-shot learning on a wide variety of downstream tasks. To recap, CLIP is pre-trained with the task of image-text retrieval. So long as CLIP is applied to downstream tasks without any additional learning process, it can be considered as zero-shot learning. The tested use cases include tasks such as object character recognition, action recognition in videos, geo-localization based on images, and many types of fine-grained image object classification. Additionally, there are basic...