Learn Before
Multiple Choice

In a reinforcement learning agent using an actor-critic architecture, the critic network is being trained. For a given state transition, the network makes the following predictions:

  • Predicted value for the current state: 15.0
  • Predicted value for the next state: 20.0

The agent receives a reward of 5.0 for the transition, and the discount factor is 0.9.

Based on this single experience, how should the critic network's parameters be adjusted to minimize its loss?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science