Concept

Decoder

  • For the decoder, autoregressive generation is used to output a sequence, an element at a time, until an end-of-sentence marker appears.
  • Typically, use an LSTM or GRU-based RNN, where the context vector consists of the final hidden state of the encoder and is used to initialize the first hidden state of the decoder.
  • To avoid the fading influence of context vector during the decoding process, a solution is to add the context vector as a parameter to the computation of the current hidden state.
  • In order to keep track of what has already been generated and what hasn’t, condition the output on three parts, the newly generated hidden state, the output generated at the previous state, and the encoder context.
  • Beam search is used to optimize the output, preventing the unreliable result by independently choosing the argmax over a sequence.

0

2

Updated 2026-05-02

Tags

Data Science

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related