Multiple Choice

A machine learning team is using a combined objective to train a small 'student' model. The goal is to find the student model's parameters (θ) that maximize the following expression: θ~=argmaxθ(x,y)DlogPrθs(yx)λLosskd\tilde{\theta} = \arg \max_{\theta} \sum_{(\mathbf{x}, \mathbf{y}) \in \mathcal{D}} \log \Pr_{\theta}^{s}(\mathbf{y}|\mathbf{x}) - \lambda \cdot \text{Loss}_{\text{kd}} The first term, logPrθs(yx)\log \Pr_{\theta}^{s}(\mathbf{y}|\mathbf{x}), measures how well the student predicts the ground-truth labels (y)(\mathbf{y}). The second term, Losskd\text{Loss}_{\text{kd}}, measures the difference between the student's and a larger 'teacher' model's predictions. The team is working with a dataset where the ground-truth labels are known to be somewhat noisy and contain occasional errors. However, the large teacher model has been shown to provide very reliable and well-generalized predictions. Given this situation, how should the team adjust the hyperparameter λ\lambda to optimize the student model's performance?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related