Understand CLIP (Contrastive Language-Image Pre-Training) — Visual Models from NLP
CLIP introduces a model that enables zero shot learning for a new dataset (in addition to a new example) by using NLP to supervise pre-training. i.e., To identify an object, you can provide the name or description of a new object that the model has not seen before. Traditionally a computer vision model was trained … Read more