Using more context and pretraining improves (lowers) perplexity. Training on joint split consistently improves random models, and upsampling idiom-train data yields bad results due to overfitting. 


University of California, Berkeley

Authors use annotations in the idiom data to encode idiom words within different contexts, measuring how it affects decoder distributions and translation output. (1) full context encodes the idiom phrase within the whole input sentence, (2) phrase-level context encodes idiom phrase in isolation, and (3) word-level context encodes each idiom word independently. 


Understanding how Models Translate Idioms


In this probe, the authors translate encoder representation and evaluate the sample against the reference translations. Each full input sentence is encoded, and those of the idiom words are replaced with those obtained with narrower contexts. 


Variation in Translation Performance


In this probe, authors vary encoder idiom representation to measure how it affects the likelihood of reference translation. Each reference translation is scored by perplexity under the model given each encoder output sequence. 


Variation in Translation Likelihood 


Increasing Idiom Context 


Token-level uncertainty varies in idiom vs non-idiom word translation. The authors first translate each sentence pair with teacher-forcing, measures the entropy of the decoder’s distributions for each target token. Lastly, they use word alignments to separately average the entropy values of target words assigned to idiom and non-idiom source words. 


Learn Before

Related