Learn Before
Relation
Training Data
- Training data included the standard WMT 2014 English-German dataset consisting of about 4.5 million sentence pairs. Sentences were encoded using byte-pair encoding.
- For English-French, we used the significantly larger WMT 2014 English-French dataset consisting of 36M. Sentence pairs were batched together by approximate sequence length.
- Each training batch contained a set of sentence pairs containing approximately 25000 source tokens and 25000 target tokens.
- Optimizer – Adam B1 = 0:9, B2 = 0:98 and e = 10^-9
0
1
Updated 2026-05-03
Tags
Data Science