1Cademy - An engineer is training a neural network for a next-word prediction task. During each training iteration, the model is provided with the correct preceding words from the training data to predict the next word at each position in a sequence. The model is designed to calculate the prediction errors for all positions in the sequence simultaneously within a single computational pass. Which of the following best explains the architectural property that is essential for this parallel and efficient tra

Learn Before

Standard Optimization Objective for Transformer Language Models

Multiple Choice

An engineer is training a neural network for a next-word prediction task. During each training iteration, the model is provided with the correct preceding words from the training data to predict the next word at each position in a sequence. The model is designed to calculate the prediction errors for all positions in the sequence simultaneously within a single computational pass. Which of the following best explains the architectural property that is essential for this parallel and efficient tra

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related