1Cademy - Evaluating a Policy Update for a Chatbot

Learn Before

Increased Action Probability Condition

Case Study

Evaluating a Policy Update for a Chatbot

A chatbot is being trained to be more helpful. In a situation where a user says 'I can't find my order,' the chatbot needs to decide on its next action. Before a training update, the reference policy gave the action 'Provide a link to the order tracking page' a probability of 0.3. After the training update, the new policy gives the same action a probability of 0.75. Analyze this policy change. First, determine if the action is now more favored by the new policy compared to the reference policy. Second, explain why this specific change likely represents a successful training step towards the goal of being more helpful.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related