1Cademy - Training Strategy for a BERT-based Encoder

Learn Before

Using BERT as an Encoder in Sequence-to-Sequence Models

Case Study

Training Strategy for a BERT-based Encoder

A team is building a machine translation model using an encoder-decoder architecture. They use a pre-trained bidirectional language model as the encoder and a randomly initialized model as the decoder. During the training process on their translation dataset, they 'freeze' all the parameters of the pre-trained encoder and only update the parameters of the decoder. Analyze the primary limitation of this training strategy.

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related