Learn Before
Concept
Kernel Width Effect on Attention Pooling
In attention pooling, the width of the kernel dictates the smoothness of the estimate and its responsiveness to local variations. A narrower kernel restricts the range of large attention weights, leading to a less smooth estimate that adapts more closely to local data variations. Conversely, a wider kernel distributes attention weights more broadly, resulting in a smoother overall estimate.
0
1
Updated 2026-05-14
Tags
D2L
Dive into Deep Learning @ D2L