Concept

Learning Rate and Training Time Trade-off in LLMs

Achieving stable training for large models with gradient descent often requires selecting a small learning rate. This choice introduces a critical trade-off: while a smaller learning rate helps prevent training instability, it significantly increases the overall training time.

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences