Example

Batch Size for Sequential Data in A2C Value Loss

When calculating the value network loss in the Advantage Actor-Critic (A2C) algorithm for sequential data, the number of training samples, M, can be equated to the length of the sequence. For instance, if the input is a sequence containing T tokens, the batch size M can be set to T.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences