Example

Identity Matrix as Attention Weights Visualization

As a fundamental sanity check for attention visualizations, an identity matrix can be used to represent the attention weights. In this scenario, the attention weight is exactly 11 only when the query and the key correspond to the same index, representing a perfect one-to-one focus. This idealized case serves to verify that the visualization mechanism correctly maps the query-key pairs to the expected heatmap layout.

Image 0

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L