To efficiently calculate the dot products between a center word vector and multiple context or noise vectors in the skip-gram model, deep learning frameworks employ batch matrix multiplication. By permuting the axes of the context and noise word vectors and performing a batch dot product with the center word vectors, the model computes all pairwise dot products simultaneously for the minibatch. This linear algebra implementation step outputs a tensor of shape $$(\text{batch size}, 1, \text{max\_len})$$, representing the prediction scores.

```python
def skip_gram(center, contexts_and_negatives, embed_v, embed_u):
    v = embed_v(center)
    u = embed_u(contexts_and_negatives)
    pred = torch.bmm(v, u.permute(0, 2, 1))
    return pred
```

Claude

In the forward propagation of the skip-gram model, the input consists of center word indices of shape $$(\text{batch size}, 1)$$ and concatenated context and noise word indices of shape $$(\text{batch size}, \text{max\_len})$$. These two sets of indices are first transformed into dense vectors via an embedding layer. Following this transformation, a batch matrix multiplication is performed between the embedded center words and the embedded context and noise words. This operation returns an output of shape $$(\text{batch size}, 1, \text{max\_len})$$, where each individual element represents the dot product between a center word vector and a context or noise word vector.

Learn Before

Related