You’re deploying an LLM to draft customer-facing i...
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Data Science
Related
Mathematical Justification for Greedy Search
Construction of the Optimal Sequence in Greedy Search
Candidate Set in Greedy Search
A language model is generating a two-token sequence. At the first step, it calculates the probability for the next token: 'Token A' has a probability of 0.6, and 'Token B' has a probability of 0.4. If the model chooses 'Token A', the most probable subsequent token is 'Token C' (with a conditional probability of 0.5). If the model had chosen 'Token B', the most probable subsequent token would be 'Token D' (with a conditional probability of 0.9). A text generation algorithm is used that, at every step, commits to the single token with the highest immediate probability. Based on this process, which sequence will be generated and why?
Algorithm Suitability for Text Generation Tasks
When generating a sequence of text, an algorithm that selects the single most probable token at each step is guaranteed to produce the overall most probable sequence.
Analyzing Suboptimal Outcomes in Text Generation
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Post-incident analysis: fixing repetition and truncation by tuning decoding
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints
You are tuning decoding for an internal "meeting-n...
You’re implementing an LLM feature that generates ...
You’re building an internal “RFP response drafter”...
You’re deploying an LLM to draft customer-facing i...
Beam search
Beam Width (K)
Top-K Token Selection in Beam Search
A text generation model is creating a sequence of words. It uses a search process that keeps track of the 2 most probable sequences at each step. The score for a sequence is the sum of the log-probabilities of its words. Given the state of the search below, which two sequences will be kept for the next step?
Step 1: The initial two sequences being tracked are:
- Sequence 1: "The" (Score: -0.5)
- Sequence 2: "A" (Score: -0.9)
Step 2: The model calculates the log-probabilities for the next possible words for each sequence:
- Expanding "The":
- "cat": -0.8
- "dog": -1.1
- Expanding "A":
- "mouse": -0.2
- "lion": -1.5
Analyzing Search Algorithm Behavior
Diagnosing a Flaw in Sequence Generation
You are tuning decoding for an internal "meeting-n...
You’re deploying an LLM to draft customer-facing i...
You’re building an internal “RFP response drafter”...
You’re implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints
Top-k Sampling Process
Comparison of Top-p and Top-k Sampling
A language model is generating text and has calculated the following probabilities for potential next tokens:
mat(0.45),rug(0.25),floor(0.15),table(0.10), andwindow(0.03). If the model uses a decoding strategy where it first identifies the 3 most probable tokens and then randomly samples one token from only that reduced group, which of the following statements is true?Effect of Candidate Pool Size on Text Generation
A language model is configured to generate text by first selecting a fixed number of the most probable next tokens and then sampling from only that reduced set. If the fixed number of tokens to consider is significantly decreased (e.g., from 100 to 5), what is the most likely impact on the generated text?
argTopK Function
Definition of the Top-k Selection Pool
You are tuning decoding for an internal "meeting-n...
You’re deploying an LLM to draft customer-facing i...
You’re building an internal “RFP response drafter”...
You’re implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints
Softmax Renormalization in Top-k Sampling
Ranking and Top-p (Nucleus) Sampling Process
Comparison of Top-p and Top-k Sampling
A language model is generating text and has calculated the following probabilities for the next potential token:
{'the': 0.40, 'a': 0.30, 'one': 0.15, 'an': 0.10, 'some': 0.05}. If the model uses a sampling method where it selects from the smallest set of the most likely tokens whose cumulative probability exceeds a threshold ofp = 0.75, which set of tokens will it sample from?Effect of Parameter 'p' on Text Generation
Dynamic Candidate Set in Probabilistic Text Generation
You are tuning decoding for an internal "meeting-n...
You’re deploying an LLM to draft customer-facing i...
You’re building an internal “RFP response drafter”...
You’re implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints
Balancing Randomness and Coherence in Token Sampling
Using Temperature with Softmax to Control Randomness in Token Selection
Token Sampling from a Conditional Probability Distribution
A language model is calculating the next token's probability distribution over a set of four candidate tokens. The raw output scores (logits) for these tokens are: {Token A: 4.0, Token B: 3.8, Token C: 1.5, Token D: 1.2}. The current generation process uses a temperature parameter
β = 1.0. A developer wants to modify the process to make the model's output less predictable and increase the likelihood of selecting Token B relative to Token A. Which of the following adjustments to the temperature parameterβwould best achieve this goal?Effect of Temperature on Probability Distributions
Parameter Tuning for Text Generation Tasks
You are tuning decoding for an internal "meeting-n...
You’re deploying an LLM to draft customer-facing i...
You’re building an internal “RFP response drafter”...
You’re implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints
A developer is building a system to generate single-sentence headlines for news articles. The initial results are often too brief and lack important details (e.g., generating 'An incident occurred' instead of 'A five-alarm fire broke out at a downtown warehouse'). Which of the following adjustments to the generation process is most likely to encourage the model to produce more descriptive, yet still single-sentence, headlines?
Analyzing the Impact of Length Penalty Variations
Evaluating the Application of Output Length Controls
You are tuning decoding for an internal "meeting-n...
You’re deploying an LLM to draft customer-facing i...
You’re building an internal “RFP response drafter”...
You’re implementing an LLM feature that generates ...
Post-incident analysis: fixing repetition and truncation by tuning decoding
Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product
Selecting and Justifying a Decoding Policy for Two Production Use Cases
Choosing a Decoding Configuration Under Latency, Diversity, and Length Constraints
Release-readiness decision: decoding configuration for a customer-facing summarization feature
Decoding policy decision for a multilingual support assistant under safety, latency, and verbosity constraints