Multiple Choice

A team is training a language model to solve multi-step logic puzzles. Their training system automatically reviews each line of the model's generated reasoning. If a line represents a valid deductive step, it receives a positive score. If a line contains a logical fallacy or contradicts a previous statement, it receives a negative score, and the evaluation stops. The total score for the entire reasoning path is then used to update the model. Which classification best describes this type of feedback mechanism?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science