1Cademy - An engineer is implementing a reward model where the final scalar score `r` is computed from the last hidden state vector `h_last` using the formula `r = h_last * W_r`. If the hidden state vector `h_last` has dimensions of `[1 x 4096]`, what must be the dimensions of the weight matrix `W_r` for the formula to produce a single scalar value?

Learn Before

Reward Score Formula for LLM-based Reward Models

Multiple Choice

An engineer is implementing a reward model where the final scalar score r is computed from the last hidden state vector h_last using the formula r = h_last * W_r. If the hidden state vector h_last has dimensions of [1 x 4096], what must be the dimensions of the weight matrix W_r for the formula to produce a single scalar value?

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related