Learn Before
Choosing a Generation Combination Strategy
Based on the scenario provided, which method (A or B) directly implements the principle of averaging predictions at the token level? Justify your choice by explaining the fundamental difference in how the two methods combine model outputs to produce the final text.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Formula for Token-Level Model Averaging in Prompt Ensembling
Imagine two language models are tasked with completing the sentence: 'The weather today is exceptionally...'. At this specific step, they must choose the very next word. Their internal calculations produce the following probability scores for the top three candidate words:
- Model 1:
warm(0.6),sunny(0.3),bright(0.1) - Model 2:
warm(0.2),sunny(0.7),bright(0.1)
If a system combines these models by averaging their token-level probability distributions to make a decision, which word will it select as the next word in the sequence, and why?
- Model 1:
Analysis of Text Generation Combination Methods
Choosing a Generation Combination Strategy