Learn Before
Concept

Nadaraya-Watson Regression as an Attention Precursor

Nadaraya-Watson kernel regression serves as an early precursor to modern attention mechanisms. It can be applied directly to regression or classification tasks with little to no prior training or hyperparameter tuning. In this framework, the attention weight is assigned based on the similarity (or distance) between a query and a key, as well as the availability of similar observations in the dataset.

0

1

Updated 2026-05-14

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L