Learn Before
Teacher Forcing
Teacher forcing is a common training strategy for sequence models where the ground-truth token from a prior time step is used as input, rather than the model's own generated prediction. In an encoder–decoder architecture, this involves feeding the original target sequence directly into the decoder. Specifically, a special beginning-of-sequence token (e.g., `<bos>`) is prepended to the target sequence, excluding its final token. The decoder is then trained to predict the original target sequence shifted by one time step, ending with an end-of-sequence token (e.g., `<eos>`). This shifting method for self-supervised learning closely resembles standard language model training.
0
1
Contributors are:
Who are from:
Tags
Data Science
D2L
Dive into Deep Learning @ D2L
Related
Vanishing/exploding gradient
Helpful Website for BPTT
Weight typing
Computational Graph of RNN Backpropagation Through Time
Gradient of RNN Objective with Respect to Output Weights
Teacher Forcing
Auto-regressive Decoding in Machine Translation
An autoregressive sequence generation model is tasked with producing an output. At each step, it calculates the probability for every possible next element and selects the single element with the highest probability before moving to the next step. What is the primary limitation of this step-by-step selection strategy?
Decoder Input Analysis
Diagnosing Translation Degradation
Teacher Forcing
Beam Search Strategy in Sequence-to-Sequence Models