1Cademy - Combined Use of Instruction and Human Preference Alignment

Dataset 1: A curated set of prompts, each paired with a single, ideal, human-written response that demonstrates how to follow the prompt&#x27;s instructions correctly.
Dataset 2: A set of prompts where, for each prompt, a human evaluator has ranked several different model-generated responses from best to worst.

Learn Before

Fundamental Approaches to LLM Alignment

Concept

Combined Use of Instruction and Human Preference Alignment

Although instruction alignment and human preference alignment are motivated by different objectives, they are frequently employed in combination to develop well-aligned Large Language Models. This integrated approach leverages the strengths of both methods to achieve more robust and comprehensive alignment.

Updated 2026-05-01

Contributors are: