1Cademy - Bilingual alignment as an unsupervised NMT component to exploit monolingual data

Learn Before

Unsupervised NMT Components for leveraging monolingual data for Low-Resource NMT

Concept

Bilingual alignment as an unsupervised NMT component to exploit monolingual data

Bilingual alignment addresses the open problem of how to initially align between two languages. Bilingual word embedding allows the NMT system to either start from a word-by-word translation derived from a bilingual word embedding, or initialize the embedding parameters according to bilingual word embedding. Denoising auto-encoder (DAE) can build a shared latent space of two languages by learning to reconstruct sentences in the two languages from a noised version. Unsupervised statistical machine translation has an initial alignment that can be obtained through the back-translated corpora generated by an unsupervised NMT system. Language model pre-training can also be used.

Updated 2022-05-29

Contributors are:

Who are from:

References

A Survey on Low-Resource Neural Machine Translation

Learn Before

Related