Learn Before
  • Language Models (LMs)

  • Transformer

Generalization of the Language Modeling Concept

Alongside the rise of the Transformer architecture, the concept of language modeling was generalized to encompass models that learn to predict words in various ways, rather than strictly predicting the next token in a sequence. Many powerful Transformer-based models were pre-trained using these diverse word prediction tasks and successfully applied to a wide variety of downstream tasks.

0

1

5 days ago

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Types of Language Models

  • Evaluating language models

  • Shannon's Foundational Work on Language Modeling

  • Generalization of the Language Modeling Concept

  • Chain Rule for Sequence Probability

  • Deep Learning Approach to Language Modeling

  • Output Token Sequence in LLMs

  • Start of Sentence (SOS) Token

  • [CLS] Token as a Start Symbol

  • A system is designed to predict the probability of a sequence of words. For the sequence 'The dog ran', the system provides the following conditional probabilities:

    • The probability of 'The' occurring at the start of a sequence is 0.2.
    • The probability of 'dog' occurring after 'The' is 0.3.
    • The probability of 'ran' occurring after 'The dog' is 0.7.

    Based on the fundamental principle used by such systems to determine the likelihood of a full sequence, what is the overall probability of the sequence 'The dog ran'?

  • Analyzing Language Model Probability Assignments

  • A system's primary goal is to predict the probability of a sequence of tokens. To calculate the total probability for the sequence 'The quick brown fox', it breaks the problem down into a series of conditional probability calculations. Arrange the following calculations in the correct order that the system would use to find the total probability of the sequence.

  • Evaluating a Language Model's Probabilistic Output

  • Self-attention layers' first approach

  • Transformers in contextual generation and summarization

  • Huggingface Model Summary

  • A Survey of Transformers (Lin et. al, 2021)

  • Overview of a Transformer

  • Model Usage of Transformers

  • Attention in vanilla Transformers

  • Transformer Variants (X-formers)

  • The Pre-training and Fine-tuning Paradigm

  • Architectural Categories of Pre-trained Transformers

  • Computational Cost of Self-Attention in Transformers

  • Quadratic Complexity's Impact on Transformer Inference Speed

  • Pre-Norm Architecture in Transformers

  • Critique of the Transformer Architecture's Core Limitation

  • A research team is building a model to summarize extremely long scientific papers. They are comparing two distinct architectural approaches:

    • Approach 1: Processes the input text sequentially, token by token, updating an internal state that is passed from one step to the next.
    • Approach 2: Processes all input tokens simultaneously, using a mechanism that directly relates every token to every other token in the input to determine context.

    Which of the following statements best analyzes the primary trade-off between these two approaches for this specific task?

  • Architectural Design Choice for Machine Translation

  • Enablers of Universal Language Capabilities

  • Model Depth in Transformers

  • Generalization of the Language Modeling Concept

  • Transformer Block Sub-Layers

  • Standard Optimization Objective for Transformer Language Models

Learn After
  • Which of the following scenarios best exemplifies the generalization of the language modeling concept beyond its traditional definition of strictly predicting the next word in a sequence?

  • A model designed solely to fill in a blank word in the middle of a sentence (e.g., 'The quick brown ___ jumps over the lazy dog') is performing the task of language modeling according to its original, traditional definition.

  • Evolution of Language Modeling