The classifier for word2vec is a binary logistic regression, which applies sigmoid function.
given a tuple of words(w,c), where wis the target word, cis one of the context words:
The possibility that c is a context word:
P(+∣w,c)=σ(c⋅w)=1+exp(−c⋅w)1
The possibility that c is not a context word:
P(−∣w,c)=1−P(+∣w,c)=σ(− c⋅w)=1+exp(c⋅w)1
But there are several context words:
P(+∣w,c1:L)=i=1∏Lσ(ci⋅w)=i=1∏L1+exp(−ci⋅w)1