1Cademy - Example of a User Prompt in RLHF

Learn Before

Data Collection for Reward Modeling in RLHF

Example

Example of a User Prompt in RLHF

An example of a user prompt that an LLM might receive at the beginning of the Reinforcement Learning from Human Feedback (RLHF) process is: 'How can I live a more environmentally friendly life?' This type of input is used to generate multiple responses from the model, which are then evaluated by humans.

Updated 2025-10-09

Contributors are: