Learn Before
Concept

Mini example

Here’s the algorithm in detail for our mini example:

  1. Select three embeddings from E: Given the three previous words, we look up their indices, create 3 one-hot vectors, and then multiply each by the embedding matrix E. Consider wt−3. The one-hot vector for ‘for’ (index 35) is multiplied by the embedding matrix E, to give the first part of the first hidden layer, the embedding layer. Since each column of the input matrix E is an embedding for a word, and the input is a one-hot column vector xi for word Vi, the embedding layer for input w will be Exi = ei, the embedding for word i. We now concatenate the three embeddings for the three context words to produce the embedding layer e.

  2. Multiply by W: We multiply by W (and add b) and pass through the ReLU (or other) activation function to get the hidden layer h.

  3. Multiply by U: h is now multiplied by U

  4. Apply softmax: After the softmax, each node i in the output layer estimates the probability P(wt = i|wt−1,wt−2,wt−3)

In summary, the equations for a neural language model with a window size of 3,given one-hot input vectors for each input context word, are: e = [Ext−3;Ext−2;Ext−1] h = σ(We+b) z = Uh yˆ = softmax(z) (7.22) Note that we formed the embedding layer e by concatenating the 3 embeddings for the three context vectors; we’ll often use semicolons to mean concatenation of vectors.

Image 0

0

1

Updated 2021-10-24

Contributors are:

Who are from:

Tags

Data Science