Example

Example of Beam Search Process

Consider generating a sequence from a vocabulary of five elements, Y={A,B,C,D,E}\mathcal{Y} = \{A, B, C, D, E\}, using a beam size of k=2k=2 and a maximum sequence length of 33. At the first time step, the two tokens with the highest conditional probabilities, such as AA and CC, are chosen. At the second time step, probabilities for all extensions are computed, for example P(A,y2c)=P(Ac)P(y2A,c)P(A, y_2 \mid \mathbf{c}) = P(A \mid \mathbf{c})P(y_2 \mid A, \mathbf{c}) and P(C,y2c)=P(Cc)P(y2C,c)P(C, y_2 \mid \mathbf{c}) = P(C \mid \mathbf{c})P(y_2 \mid C, \mathbf{c}). The algorithm selects the top two combinations overall, such as A,BA, B and C,EC, E. This expansion repeats at the third time step by computing P(A,B,y3c)P(A, B, y_3 \mid \mathbf{c}) and P(C,E,y3c)P(C, E, y_3 \mid \mathbf{c}), ultimately resulting in the final candidate sequences, like A,B,DA, B, D and C,E,DC, E, D.

Image 0

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L