Multiple Choice

A research team is training a small, efficient 'student' model to replicate the behavior of a large, powerful 'teacher' model. The team's goal is to find the optimal parameters for the student model (θ^\hat{\theta}) by minimizing a loss function over a dataset of simplified inputs (D\mathcal{D}'), as defined by the following objective:

θ^=argminθxDLoss(Prt(),Prθs(),x)\hat{\theta} = \arg\min_{\theta} \sum_{\mathbf{x}' \in \mathcal{D}'} \text{Loss}(\text{Pr}^t(\cdot|\cdot), \text{Pr}_{θ}^s(\cdot|\cdot), \mathbf{x}')

Where Prt\text{Pr}^t is the teacher's output probability distribution and Prθs\text{Pr}_{θ}^s is the student's.

If the team mistakenly configures the training process to use the teacher's original, complex dataset instead of the intended simplified dataset D\mathcal{D}', which of the following outcomes is the most direct and likely consequence for the student model?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science