Sequence Ordering

A team is implementing a reinforcement learning-based system to generate optimized prompts. The system consists of a base large language model (LLM) and a smaller, trainable adaptor network that functions as the policy network. Arrange the following steps to describe a single iteration of the training loop for this system.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science