TGM is a language model, specifically a neural language model estimating the probability distribution of a token or word.
TGM estimates parameters by minimising the following objective function
L(pθ,D)=−∑j=1∣D∣∑t=1∣x(j)∣logpθ(xt(j)∣x1(j),...,xi(j),....,x(t−1)(j))
xi∈V - Vocabulary of words
x=(x1,....,x∣x∣) - text sequence
p∗(x) - reference distribution
D - a finite set of text sequences from p∗
pθ(xt∣x1,...,xi,....,x(t−1)) - Probability of next token given the previous tokens in a given sentence