Formula for Pruned Step-wise Expansion of the Hypothesis Set
In practical decoding algorithms, the search space consists of an exponentially large number of sequences. To prevent the computational load from growing exponentially with sequence length, strategies are used to prune the space. At each decoding step, the set of candidate sequences is formed by applying a pruning function to the full set of expanded hypotheses: . The function selectively removes sequences less likely to result in high-quality outcomes. Consequently, the number of sequences under consideration is drastically reduced, ensuring that .

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Complete Sequences as a Stopping Condition for Expansion
Formula for Pruned Step-wise Expansion of the Hypothesis Set
In a text generation process, the set of candidate sequences for the next step is created by appending every word from a fixed vocabulary to the end of each sequence in the current set. If the current set contains 5 candidate sequences and the vocabulary consists of 100 words, how many new candidate sequences will be generated for the next step?
Computational Implications of Hypothesis Expansion
Hypothesis Set Expansion in a Simplified Scenario
Greedy Search (Greedy Decoding)
Formula for Pruned Step-wise Expansion of the Hypothesis Set
A language model is generating a sentence and must decide on the next word. It has identified 100 possible words, each with an associated probability. To manage computational resources, the model employs a strategy that discards all but the top 5 most probable words before considering the subsequent step. Which of the following statements best analyzes the primary trade-off inherent in this strategy?
Analyzing Text Generation System Performance
Rationale for Decoding Heuristics
Learn After
In a sequence generation process, the set of candidate sequences at step
i, denotedY_i, is generated from the previous setY_{i-1}and the entire vocabularyV. Consider the difference between two methods for generatingY_i:Method A:
Y_i = Y_{i-1} × VMethod B:Y_i = Prune(Y_{i-1} × V)What is the most significant practical difference in the outcome of using Method B instead of Method A, particularly for generating longer sequences?
Diagnosing a Failing Sequence Generation Algorithm
Applying Pruning in Sequence Generation