Learn Before
Concept

Sparse Rewards in NLP

In many Natural Language Processing (NLP) applications, such as machine translation, rewards are often sparse. This means that the agent receives a non-zero reward signal only after completing an entire sequence, like generating a full sentence. For all intermediate steps (e.g., generating individual words), the reward is zero (rt=0r_t = 0 for t<Tt < T), which can make learning challenging.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences