Learn Before
Optimal Parameters Formula in RL Fine-Tuning
In reinforcement learning (RL) fine-tuning, the optimal parameters, denoted as , are obtained by fine-tuning the pre-trained parameters . This optimization seeks to maximize an expected value over the RL fine-tuning dataset, , using the formula:
In this equation, represents the parameters of the active policy being optimized, while evaluates the paired sample of the input sequence and the model-generated output .
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formulating the Loss Function for Policy Learning in RLHF
A team is refining a language model using a method where, for each training step, a prompt is selected and the model itself generates a response. This prompt-response pair is then used as part of the input for that training step's update calculation. Based on this description, what is the most accurate analysis of the function of the model-generated response in this specific training phase?
Policy Learning in RLHF
Comparing Data Sourcing Strategies
Contrasting Data Sourcing Methods in Model Training
Optimal Parameters Formula in RL Fine-Tuning