Learn Before
Concept
Similarity of Nontrivial Attention Kernels
When applied to Nadaraya-Watson regression, nontrivial attention kernels such as the Gaussian, Boxcar, and Epanechikov kernels produce very similar, workable estimates that are not too far from the true underlying function. This similarity arises because, despite having different mathematical functional forms, these kernels yield very comparable attention weights after normalization, causing them to pool the data values in a roughly equivalent manner.
0
1
Updated 2026-05-14
Tags
D2L
Dive into Deep Learning @ D2L