Learn Before
Concept

Encoding Process in Permuted Language Modeling

While standard pre-training tasks typically employ an encoding process where each token can attend to the entire sequence via self-attention, permuted language modeling deviates from this norm. To facilitate an autoregressive generation process within an encoder-only architecture, permuted language modeling implements its sequence prediction by applying specific attention masks on the encoder side.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related