Bilingual alignment as an unsupervised NMT component to exploit monolingual data
Bilingual alignment addresses the open problem of how to initially align between two languages. Bilingual word embedding allows the NMT system to either start from a word-by-word translation derived from a bilingual word embedding, or initialize the embedding parameters according to bilingual word embedding. Denoising auto-encoder (DAE) can build a shared latent space of two languages by learning to reconstruct sentences in the two languages from a noised version. Unsupervised statistical machine translation has an initial alignment that can be obtained through the back-translated corpora generated by an unsupervised NMT system. Language model pre-training can also be used.
0
1
Tags
Deep Learning (in Machine learning)
Data Science