1Cademy - Teacher Forcing

Learn Before

Backpropagation Through Time (BPTT)
Decoder

Concept

Teacher Forcing

Teacher forcing is a common training strategy for sequence models where the ground-truth token from a prior time step is used as input, rather than the model's own generated prediction. In an encoder–decoder architecture, this involves feeding the original target sequence directly into the decoder. Specifically, a special beginning-of-sequence token (e.g., `<bos>`) is prepended to the target sequence, excluding its final token. The decoder is then trained to predict the original target sequence shifted by one time step, ending with an end-of-sequence token (e.g., `<eos>`). This shifting method for self-supervised learning closely resembles standard language model training.