Zero-shot Image Classification with OpenAI's CLIP

State-of-the-art (SotA) computer vision (CV) models are characterized by a restricted understanding of the visual world based on their training data [1].

These models can perform very well on specific tasks and datasets, but they do not generalize well. They cannot handle new classes or images beyond the domain they have been trained with.

