Enhancing with bilingual dictionary for leveraging monolingual data for Low-Resource NMT
Bilingual dictionary contains word-level parallel information, which is helpful on the alignment between two languages. The bilingual dictionary of a low-resource language paid can be collected either by human annotation or word embedding based alignment, which is much easier to obtain than the bilingual parallel sentences. Since the bilingual dictionary contains only word-level information, it is usually used with monolingual data to improve the translation. Pseudo-parallel sentences can be build by translating the source-side monolingual sentences to target language via SMT which is built based on the bilingual dictionary. Parallel data can be augmented by replacing some words in parallel sentences with rare words. Furthermore, bilingual dictionaries can also be used to perform word-by-word translation on monolingual data. Finally, the gap of embedding spaces between source and target languages can be bridged by anchoring points based on the dictionary.
0
1
Tags
Deep Learning (in Machine learning)
Data Science
Related
Back translation and forward translation for leveraging monolingual data for Low-Resource NMT
Joint training on both translation directions for leveraging monolingual data for Low-Resource NMT
Unsupervised NMT for leveraging monolingual data for Low-Resource NMT
Language model pre-training for leveraging monolingual data for Low-Resource NMT
Exploiting comparable corpus for leveraging monolingual data for Low-Resource NMT
Enhancing with bilingual dictionary for leveraging monolingual data for Low-Resource NMT