Learn Before
Multiple Choice

Consider a 4-class classification problem where the final layer of a model produces the following pre-activation scores for a single input: [1.0, 2.0, 1.5, 5.0]. The model then uses an activation function that exponentiates each score and normalizes the results to produce a probability distribution. Without performing the full calculation, which of the following statements best describes the resulting probability distribution?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Data Science

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science