Learn Before
CLIP (Contrastive Language-Image Pre-training)
CLIP (Contrastive Language-Image Pre-training) is a multimodal model that encodes both text and images by combining the text encoding capabilities of models like GPT-2 with a vision Transformer. The resulting image and text embeddings from CLIP were later foundational to the development of the DALL-E 2 text-to-image system.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
The creators of the large-scale, unsupervised language model introduced in 2019 initially withheld the full version from the public, citing concerns about potential misuse. Which statement best evaluates the significance of this 'staged release' strategy for the field of artificial intelligence?
Analysis of Model Scaling Impact
Evaluating Model Capabilities in a Research Scenario
In-Context Learning
In-Context Learning (ICL)
Megatron-Turing NLG
Gopher
CLIP (Contrastive Language-Image Pre-training)