The classifier for word2vec is a binary logistic regression, which applies sigmoid function.
given a tuple of words(w,c), where wis the target word, cis one of the context words:
The possibility that c is a context word:
P(+∣w,c)=σ(c⋅w)=1+exp(−,c⋅w)1
The possibility that c is not a context word:
P(−∣w,c)=1−P(+∣w,c)=σ(−,c⋅w)=1+exp(c⋅w)1
But there are several context words:
P(+∣w,c1:L)=∏i=1Lσ(ci⋅w)=∏i=1L1+exp(−,ci⋅w)1