Learn Before
Multiple Choice

A team is training a language model to act as a programming assistant that writes code to solve specific problems. Their training method involves running the code generated by the model. If the code executes without errors and produces the correct output for a set of predefined tests, the model receives a high reward. If the code fails to execute or produces the wrong output, it receives a low reward. The system does not evaluate the elegance, efficiency, or style of the code itself, only the final result of its execution. Which of the following statements best characterizes this evaluation approach?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science