Learn Before
Concept
Nadaraya-Watson Regression as an Attention Precursor
Nadaraya-Watson kernel regression serves as an early precursor to modern attention mechanisms. It can be applied directly to regression or classification tasks with little to no prior training or hyperparameter tuning. In this framework, the attention weight is assigned based on the similarity (or distance) between a query and a key, as well as the availability of similar observations in the dataset.
0
1
Updated 2026-05-14
Tags
D2L
Dive into Deep Learning @ D2L