Learn Before
The Gaussian Error Linear Unit (GELU) activation function is defined as , where represents the cumulative distribution function (CDF) of the standard normal distribution (a bell curve centered at zero). Based on this formula, what is the output of the function for an input value of ?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is tasked with improving a Chinese-to-English translation. The desired process is for the model to first explicitly identify any errors in an initial translation and then generate a corrected version based on that analysis. Which of the following prompt structures correctly instructs the model to perform this specific two-step task?
The Gaussian Error Linear Unit (GELU) activation function is defined as , where represents the cumulative distribution function (CDF) of the standard normal distribution (a bell curve centered at zero). Based on this formula, what is the output of the function for an input value of ?
The GELU activation function is defined as , where is the cumulative distribution function (CDF) of the standard normal distribution. Based on the properties of the CDF, how does the output of the GELU function behave for a very large negative input (i.e., ) versus a very large positive input (i.e., )?