Concept

Enhancing with bilingual dictionary for leveraging monolingual data for Low-Resource NMT

Bilingual dictionary contains word-level parallel information, which is helpful on the alignment between two languages. The bilingual dictionary of a low-resource language paid can be collected either by human annotation or word embedding based alignment, which is much easier to obtain than the bilingual parallel sentences. Since the bilingual dictionary contains only word-level information, it is usually used with monolingual data to improve the translation. Pseudo-parallel sentences can be build by translating the source-side monolingual sentences to target language via SMT which is built based on the bilingual dictionary. Parallel data can be augmented by replacing some words in parallel sentences with rare words. Furthermore, bilingual dictionaries can also be used to perform word-by-word translation on monolingual data. Finally, the gap of embedding spaces between source and target languages can be bridged by anchoring points based on the dictionary.

0

1

Updated 2022-05-29

Contributors are:

Who are from:

Tags

Deep Learning (in Machine learning)

Data Science