Learn Before
Example
Example of Beam Search Process
Consider generating a sequence from a vocabulary of five elements, , using a beam size of and a maximum sequence length of . At the first time step, the two tokens with the highest conditional probabilities, such as and , are chosen. At the second time step, probabilities for all extensions are computed, for example and . The algorithm selects the top two combinations overall, such as and . This expansion repeats at the third time step by computing and , ultimately resulting in the final candidate sequences, like and .
0
1
Updated 2026-05-14
Tags
D2L
Dive into Deep Learning @ D2L