1Cademy - A team is implementing a reinforcement learning-based system to generate optimized prompts. The system consists of a base large language model (LLM) and a smaller, trainable adaptor network that functions as the policy network. Arrange the following steps to describe a single iteration of the training loop for this system.

Learn Before

Example of an RL-based Prompt Generator

Sequence Ordering

A team is implementing a reinforcement learning-based system to generate optimized prompts. The system consists of a base large language model (LLM) and a smaller, trainable adaptor network that functions as the policy network. Arrange the following steps to describe a single iteration of the training loop for this system.

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related