1Cademy - A neural network produces a final matrix of hidden state vectors, H, with dimensions [sequence_length × hidden_dimension]. To generate a probability distribution over a vocabulary of size V for each position in the sequence, a parameterized Softmax layer is used, which computes Softmax(H ⋅ W). What is the primary role and required shape of the weight matrix W in this operation?

Learn Before

Parameterized Softmax Layer

Multiple Choice

A neural network produces a final matrix of hidden state vectors, H, with dimensions [sequence_length × hidden_dimension]. To generate a probability distribution over a vocabulary of size V for each position in the sequence, a parameterized Softmax layer is used, which computes Softmax(H ⋅ W). What is the primary role and required shape of the weight matrix W in this operation?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related