Learn Before
Relation
Attention Functions Used
- Scaled Dot Product Attention:- Computes the attention function on a set of queries simultaneously, packed together into a matrix
- Multi-head attention:- Allows the model to jointly attend to information from different representation subspaces at different positions.
0
1
Updated 2021-08-19
Tags
Data Science