Code

Self-Attention Tensor Computation Example

To implement self-attention using a multi-head attention module, the same input tensor is passed as the queries, keys, and values. For a batch of sequences represented by a tensor X with shape (batch_size, num_steps, num_hiddens), the self-attention computation outputs a tensor of the exact same shape. The following PyTorch snippet demonstrates this using a MultiHeadAttention class:

num_hiddens, num_heads = 100, 5 attention = d2l.MultiHeadAttention(num_hiddens, num_heads, 0.5) batch_size, num_queries, valid_lens = 2, 4, torch.tensor([3, 2]) X = torch.ones((batch_size, num_queries, num_hiddens)) # Self-attention passes X as queries, keys, and values d2l.check_shape(attention(X, X, X, valid_lens), (batch_size, num_queries, num_hiddens))

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L