Concept

Kernel Width Effect on Attention Pooling

In attention pooling, the width of the kernel dictates the smoothness of the estimate and its responsiveness to local variations. A narrower kernel restricts the range of large attention weights, leading to a less smooth estimate that adapts more closely to local data variations. Conversely, a wider kernel distributes attention weights more broadly, resulting in a smoother overall estimate.

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L