Learn Before
A research team is developing a new large-scale transformer-based language model and is deciding on the activation function for the feed-forward networks. A senior engineer advocates for using the Gaussian Error Linear Unit (GELU). Which statement best evaluates the rationale for this choice, considering its historical application in influential models?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Analysis of Activation Function Choice in Transformer Architectures
A financial analyst asks a large language model, 'What was the closing stock price for ACME Corp today?' The model, with a knowledge cutoff of last year, responds: 'I cannot provide real-time information. My data is not current.' To make the model more useful for this task without retraining it, it is integrated with an external tool that can access a live stock market data feed. Which statement best analyzes the primary advantage of this approach for this specific problem?
A research team is developing a new large-scale transformer-based language model and is deciding on the activation function for the feed-forward networks. A senior engineer advocates for using the Gaussian Error Linear Unit (GELU). Which statement best evaluates the rationale for this choice, considering its historical application in influential models?