1Cademy - A development team is preparing to use a human feedback-driven process to improve an AIs helpfulness and safety. They have two candidate models to use as their starting point: <br><br>Model A: A raw, pre-trained model that is very good at predicting the next word in a sentence but has not been specifically trained to follow user commands.<br><br>Model B: A model that has been pre-trained and then further fine-tuned on a dataset of instructions and high-quality answers, making it proficient at following use

Learn Before

Establishing the Initial Policy in RLHF

Multiple Choice

A development team is preparing to use a human feedback-driven process to improve an AI's helpfulness and safety. They have two candidate models to use as their starting point:

Model A: A raw, pre-trained model that is very good at predicting the next word in a sentence but has not been specifically trained to follow user commands.

Model B: A model that has been pre-trained and then further fine-tuned on a dataset of instructions and high-quality answers, making it proficient at following use

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related