Learn Before
Concept
Cross-Attention Layer
In the cross-attention layer of the transformer implementation for the encoder-decoder architecture, the final output of the encoder is multiplied by the cross-attention layer’s key weights and value weights , but the output from the prior decoder layer us multiplied by cross-attention layer’s query weights : where is the dimension of the key vector.
0
0
Updated 2021-12-05
Tags
Data Science