1Cademy - A language model is generating the next word in a sequence and has calculated the initial probabilities for six potential words: the (0.40), a (0.25), an (0.15), some (0.10), any (0.05), and every (0.05). The system uses a decoding strategy where it only considers the top 4 most likely candidates for the final selection. After discarding the other candidates, the probabilities of the remaining words are adjusted to sum to 1. What is the adjusted probability for the word a?

Learn Before

Example of Top-k Sampling with k=3

Multiple Choice

A language model is generating the next word in a sequence and has calculated the initial probabilities for six potential words: 'the' (0.40), 'a' (0.25), 'an' (0.15), 'some' (0.10), 'any' (0.05), and 'every' (0.05). The system uses a decoding strategy where it only considers the top 4 most likely candidates for the final selection. After discarding the other candidates, the probabilities of the remaining words are adjusted to sum to 1. What is the adjusted probability for the word 'a'?

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related