Learn Before
Evaluating an Activation Function Change in a Transformer FFN
Based on the provided scenario, evaluate the engineer's proposed solution. Justify your evaluation by comparing the properties of the original activation function with the proposed one and explaining how the change could address the observed problem.
0
1
Tags
Data Science
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Gaussian Error Linear Unit (GELU)
Gated Linear Unit (GLU)
A machine learning engineer is analyzing the feed-forward network (FFN) component of a transformer model. They want to replace the standard Rectified Linear Unit (ReLU) activation function with a more modern alternative to potentially improve model performance. Which of the following statements best analyzes the rationale for using a function like the Gaussian Error Linear Unit (GELU) or Swish over ReLU in this context?
Match each activation function, which can be used in the feed-forward network of a transformer model, with its corresponding description.
Evaluating an Activation Function Change in a Transformer FFN