Learn Before
Directionality in Contextual Representations
Imagine two different computational processes for creating a contextualized representation of a specific word within a sentence. Process A considers all words in the sentence, both before and after the target word, to create its representation. Process B only considers the words that come before the target word. Compare and contrast these two processes, focusing on the nature of the contextual information each one captures and the types of tasks each would be suited for.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Consider two different methods for creating contextualized numerical representations of words in a sentence. Method 1 generates a representation for each word based only on the words that precede it. Method 2 generates a representation for each word based on all other words in the sentence, both preceding and succeeding it. Which statement accurately compares these two methods to the processes found in large-scale language models?
Directionality in Contextual Representations
The primary functional difference between the prefilling phase in an autoregressive model and the encoding process in a model like BERT is the specific mathematical operations used to create token representations; their approach to incorporating contextual information from the input sequence is fundamentally identical.