Multiple Choice

An engineer is training a system using a reinforcement learning approach. The system's behavior is determined by a set of adjustable parameters. The training process aims to find the parameter values that maximize a specific 'performance function,' which represents the expected cumulative reward. The engineer runs two separate training procedures, Procedure X and Procedure Y, and observes the following final outcomes:

  • Procedure X: The final set of parameters results in a performance function value of 150.
  • Procedure Y: The final set of parameters results in a performance function value of 125. However, Procedure Y completed in half the time of Procedure X.

Which statement best evaluates the outcomes in relation to the primary training objective?

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science