Learn Before
Comparison of Decoder Objectives in Encoder-Decoder Pre-training
When pre-training an encoder-decoder model, both BERT-style and denoising autoencoding methods provide a corrupted token sequence to the encoder, where some tokens are replaced with [MASK] (or [M]). However, their decoder objectives differ. In BERT-style training, the decoder computes the loss exclusively for the masked tokens, treating the rest of the sequence as [MASK] tokens. In contrast, denoising autoencoding requires the decoder to autoregressively predict the entire token sequence, accumulating the loss from all tokens similar to standard language modeling.
0
1
Tags
Foundations of Large Language Models
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences