Learn Before
Multiple Choice

A startup with a limited computational budget wants to align a language model with human preferences. They have a high-quality, but static, dataset of prompts, where each prompt is paired with a 'preferred' response and a 'rejected' response. A key constraint is that they cannot afford to repeatedly generate new samples from the model for evaluation during the training loop. Which of the following alignment strategies is the most practical and efficient for this startup to adopt?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related