Relation

Attention Functions Used

  1. Scaled Dot Product Attention:- Computes the attention function on a set of queries simultaneously, packed together into a matrix
  2. Multi-head attention:- Allows the model to jointly attend to information from different representation subspaces at different positions.

0

1

Updated 2021-08-19

Tags

Data Science