Learn Before
Concept

Similarity of Nontrivial Attention Kernels

When applied to Nadaraya-Watson regression, nontrivial attention kernels such as the Gaussian, Boxcar, and Epanechikov kernels produce very similar, workable estimates that are not too far from the true underlying function. This similarity arises because, despite having different mathematical functional forms, these kernels yield very comparable attention weights after normalization, causing them to pool the data values in a roughly equivalent manner.

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L