Learn Before
Model Usage of Transformers
- Encoder-Decoder: sequence to sequence (language modeling)
- Encoder Only: outputs of the encoder are utilized as a representation for the input sequence. This is usually used for classification or sequence labeling problems (i.e. BERT)
- Decoder Only: cross-attention module is removed; this is typically used for sequence generation, such as language modeling (i.e. GPT)
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Related
Self-attention layers' first approach
Transformers in contextual generation and summarization
Huggingface Model Summary
A Survey of Transformers (Lin et. al, 2021)
Overview of a Transformer
Model Usage of Transformers
Attention in vanilla Transformers
Transformer Variants (X-formers)
The Pre-training and Fine-tuning Paradigm
Architectural Categories of Pre-trained Transformers
Computational Cost of Self-Attention in Transformers
Quadratic Complexity's Impact on Transformer Inference Speed
Pre-Norm Architecture in Transformers
Critique of the Transformer Architecture's Core Limitation
A research team is building a model to summarize extremely long scientific papers. They are comparing two distinct architectural approaches:
- Approach 1: Processes the input text sequentially, token by token, updating an internal state that is passed from one step to the next.
- Approach 2: Processes all input tokens simultaneously, using a mechanism that directly relates every token to every other token in the input to determine context.
Which of the following statements best analyzes the primary trade-off between these two approaches for this specific task?
Architectural Design Choice for Machine Translation
Enablers of Universal Language Capabilities
Model Depth in Transformers
Generalization of the Language Modeling Concept
Transformer Block Sub-Layers
Standard Optimization Objective for Transformer Language Models
Learn After
Decoder-Only Transformer as a Language Model
An engineering team is tasked with building a system to perform sentiment analysis on customer reviews. The goal is to classify each review as 'positive', 'negative', or 'neutral'. For an accurate classification, the model must be able to understand the full context of the entire review, including how words at the end of a sentence can influence the meaning of words at the beginning. Which of the following architectural approaches is best suited for this specific task?
You are a machine learning engineer evaluating different model architectures for three distinct natural language processing projects. Match each project description with the most suitable architectural approach based on its core requirements.
Architectural Design for a Creative Writing Assistant
Architectural Choice for Document Summarization