Learn Before
Concept

Normalization-free transformer

Replaces LN module with a learnable residual connection H=H+αF(H)H′ =H + \alpha·F(H) which has shown to lead to faster convergence

0

1

Updated 2022-05-26

Contributors are:

Who are from:

Tags

Data Science

Related