Multiple Choice

An engineer is training a small 'student' model to mimic the predictions of a larger, pre-trained 'teacher' model. The training objective is to make the student's final, pre-activation output vector as similar as possible to the teacher's. If z_t is the teacher's output vector and z_s is the student's output vector for the same input, which of the following loss functions correctly implements this objective?

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Deep Learning (in Machine learning)

Data Science

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science