Learn Before
Code

Batch Matrix Multiplication for Skip-Gram Dot Products

To efficiently calculate the dot products between a center word vector and multiple context or noise vectors in the skip-gram model, deep learning frameworks employ batch matrix multiplication. By permuting the axes of the context and noise word vectors and performing a batch dot product with the center word vectors, the model computes all pairwise dot products simultaneously for the minibatch. This linear algebra implementation step outputs a tensor of shape (batch size,1,max_len)(\text{batch size}, 1, \text{max\_len}), representing the prediction scores.

def skip_gram(center, contexts_and_negatives, embed_v, embed_u): v = embed_v(center) u = embed_u(contexts_and_negatives) pred = torch.bmm(v, u.permute(0, 2, 1)) return pred

0

1

Updated 2026-05-25

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L