Learn Before
Multiple Choice

Two different machine learning models, Model A and Model B, use a parameterized function to convert a vector of raw scores into a probability distribution. Model A uses the function denoted as SoftmaxwA()\text{Softmax}_{\mathbf{w}_A}(\cdot), and Model B uses SoftmaxwB()\text{Softmax}_{\mathbf{w}_B}(\cdot). When given the exact same input vector, Model A produces the output [0.7, 0.2, 0.1] and Model B produces [0.3, 0.6, 0.1]. What is the most logical conclusion that can be drawn from this observation?

0

1

Updated 2025-10-01

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science