Case Study

Evaluating a Policy Update for a Chatbot

A chatbot is being trained to be more helpful. In a situation where a user says 'I can't find my order,' the chatbot needs to decide on its next action. Before a training update, the reference policy gave the action 'Provide a link to the order tracking page' a probability of 0.3. After the training update, the new policy gives the same action a probability of 0.75. Analyze this policy change. First, determine if the action is now more favored by the new policy compared to the reference policy. Second, explain why this specific change likely represents a successful training step towards the goal of being more helpful.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science