Decoder
- For the decoder, autoregressive generation is used to output a sequence, an element at a time, until an end-of-sentence marker appears.
- Typically, use an LSTM or GRU-based RNN, where the context vector consists of the final hidden state of the encoder and is used to initialize the first hidden state of the decoder.
- To avoid the fading influence of context vector during the decoding process, a solution is to add the context vector as a parameter to the computation of the current hidden state.
- In order to keep track of what has already been generated and what hasn’t, condition the output on three parts, the newly generated hidden state, the output generated at the previous state, and the encoder context.
- Beam search is used to optimize the output, preventing the unreliable result by independently choosing the argmax over a sequence.
0
2
Contributors are:
Who are from:
Tags
Data Science
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Encoder
Decoder
Context vector
Encoder-Decoder with Transformers
Multi-lingual Pre-training for Encoder-Decoder Models
Mathematical Formulation of an Encoder-Decoder Model
Seq2seq Models for Text Generation
Auto-Regressive Decoding in Machine Translation
Applying Encoder-Decoder Architectures to NLP via the Text-to-Text Framework
A sequence-to-sequence model is designed to translate English sentences into French. When given the English input, 'The quick brown fox jumps over the lazy dog,' the model produces the French output, 'Où est la bibliothèque?' ('Where is the library?'). The generated French sentence is grammatically perfect and fluent, but it is completely unrelated to the meaning of the English input. Based on this specific failure, which component of the underlying architecture is most likely the primary source of the error?
Diagnosing an Architectural Flaw in a Summarization Model
Arrange the following events to accurately describe the flow of information in a standard encoder-decoder architecture for a sequence-to-sequence task.
Your team is pretraining an internal T5-style enco...
Your company wants one internal model to support m...
Your team is pretraining an internal T5-style mode...
Your team is building a single internal T5-style t...
Diagnosing a T5-Style Model That Ignores Task Prefixes After Span-Denoising Pretraining
Choosing Between Span-Denoising Pretraining and Task-Specific Fine-Tuning in a T5-Style Text-to-Text System
Designing a Unified Text-to-Text Model and Pretraining Objective for Multiple NLP Features
Root-Cause Analysis of a T5-Style Model Producing Fluent but Unfaithful Outputs
Selecting an Architecture and Pretraining Objective for a Unified Internal NLP Service
Post-Pretraining Data Formatting Bug in a T5-Style Text-to-Text Service
Pre-training Encoder-Decoder Models via Masked Language Modeling
Decoder
A language model is generating a sentence and considers two different methods for choosing the sequence of words:
- Method A: At each step, the model selects the single most probable word and adds it to the sequence before moving to the next step.
- Method B: At each step, the model keeps track of the three most probable partial sentences generated so far, extends each of them with their most likely next words, and then keeps the three best resulting sentences to continue the process.
Which statement best analyzes the fundamental trade-off between these two methods in the context of finding the best possible output sequence?
Inferring Search Strategy from LLM Output
Explaining the Search Problem in Text Generation
Learn After
Beam search
Auto-regressive Decoding in Machine Translation
An autoregressive sequence generation model is tasked with producing an output. At each step, it calculates the probability for every possible next element and selects the single element with the highest probability before moving to the next step. What is the primary limitation of this step-by-step selection strategy?
Decoder Input Analysis
Diagnosing Translation Degradation