Learn Before
Concept

Megatron-Turing NLG

The Megatron-Turing NLG is a 530530-billion-parameter large language model that was trained using the GPT-2 Transformer decoder architecture on a dataset of 270270 billion tokens.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L