Learn Before
Concept
Random Model Configuration
Randomly initialized models use the same architecture as mBART. They are based on the Transformer architecture with 12 encoder and decoder layers, 1024 embedding size, and 16 self-attention heads.
0
1
Updated 2023-02-17
Tags
Data Science