Concept

Auto-Regressive (AR) Models

The main task of these models is to predict the next word based on previous words, and they are represented by the GPT model family. The AR infrastructures are composed of the Transformer’s decoder part, so a masking mechanism is used in the training phase, causing attention calculations to only see the content before a word. Applications are suitable for NLG tasks.

0

1

Updated 2025-10-06

Tags

Deep Learning (in Machine learning)

Foundations of Large Language Models Course

Data Science

Computing Sciences