Learn Before
Backtranslation
Backtranslation is a way of making use of monolingual corpora in the target language by creating synthetic bitexts. In backtranslation, we train an intermediate target-to-source MT system on the small bitext to translate the monolingual target data to the source language. Backtranslation has various parameters. One is how we generate the backtranslated data; we can run the decoder in greedy inference, or use beam search. Or we can do sampling, or Monte Carlo search. In Monte Carlo decoding, at each search timestep, instead of always generating the word with the highest softmax probability, we roll a weighted die, and use it to choose the next word according to its softmax probability.
0
1
Tags
Data Science
Related
Application of autoregressive generation given a prefix: Machine translation
Statistical Machine Translation vs Neural Machine Translation
Backtranslation
MT Evaluation
MT Corpora
Assessing Translation Effectiveness for a Specific Use Case
A company is developing a translation service for legal documents, where preserving the precise meaning and complex sentence structure of the original text is the highest priority. The company has access to a massive parallel corpus of legal texts. Given these requirements, which approach would be more suitable and why?
Evaluating Machine Translation Quality
Unaligned Data in Sequence Learning