Relation

Regularizations used

  1. Residual Dropout:- Residual dropout is applied to the output of each sub-layer before it is added to the sub-layer input and normalized. In addition, we apply dropout to the sums of the embeddings and the positional encodings in both the encoder and decoder stacks. For the base model, we use a rate of P_drop = 0.1.
  2. Label Smoothing:- During training, label smoothing is employed with the below value. This hurts perplexity, as the model learns to be more unsure, but improves accuracy and BLEU (Bilingual Evaluation Understudy) score.
Image 0

0

1

Updated 2021-08-19

Tags

Data Science