Learn Before
Code
Batch Matrix Multiplication for Skip-Gram Dot Products
To efficiently calculate the dot products between a center word vector and multiple context or noise vectors in the skip-gram model, deep learning frameworks employ batch matrix multiplication. By permuting the axes of the context and noise word vectors and performing a batch dot product with the center word vectors, the model computes all pairwise dot products simultaneously for the minibatch. This linear algebra implementation step outputs a tensor of shape , representing the prediction scores.
def skip_gram(center, contexts_and_negatives, embed_v, embed_u): v = embed_v(center) u = embed_u(contexts_and_negatives) pred = torch.bmm(v, u.permute(0, 2, 1)) return pred
0
1
Updated 2026-05-25
Tags
D2L
Dive into Deep Learning @ D2L