Relation

Training Data

  1. Training data included the standard WMT 2014 English-German dataset consisting of about 4.5 million sentence pairs. Sentences were encoded using byte-pair encoding.
  2. For English-French, we used the significantly larger WMT 2014 English-French dataset consisting of 36M. Sentence pairs were batched together by approximate sequence length.
  3. Each training batch contained a set of sentence pairs containing approximately 25000 source tokens and 25000 target tokens.
  4. Optimizer – Adam B1 = 0:9, B2 = 0:98 and e = 10^-9

0

1

Updated 2026-05-03

Tags

Data Science