1Cademy - Input Embedding Formula in BERT-like Models

Learn Before

Embedding Size in Transformer Models
BERT's Core Architecture

Formula

Input Embedding Formula in BERT-like Models

In BERT models, the input is a sequence of embeddings, where each individual embedding, denoted as $\mathbf{e}$ , is the sum of the token embedding ( $\mathbf{x}$ ), the positional embedding ( $\mathbf{e}_{\mathrm{pos}}$ ), and the segment embedding ( $\mathbf{e}_{\mathrm{seg}}$ ). The mathematical formula for this composition is: $\mathbf{e} = \mathbf{x} + \mathbf{e}_{\mathrm{pos}} + \mathbf{e}_{\mathrm{seg}}$ .