Multiple Choice

A researcher modifies the position-wise sub-layer within a sequence processing model. The standard design for this sub-layer is a sequence of: a linear transformation, a non-linear activation, and a second linear transformation. The researcher's modification adds a second non-linear activation function immediately after the final linear transformation. Which of the following best evaluates the impact of this architectural change?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Data Science

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science