1Cademy - A language model generates an output sequence one token at a time, where each new tokens probability depends on prior information. If the model has already produced the first three tokens of an output based on a given input sequence, which of the following best describes the complete set of information used to calculate the probability for the *fourth* token?

Learn Before

Conditional Probability in Sequence-to-Sequence Generation

Multiple Choice

A language model generates an output sequence one token at a time, where each new token's probability depends on prior information. If the model has already produced the first three tokens of an output based on a given input sequence, which of the following best describes the complete set of information used to calculate the probability for the fourth token?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related