Learn Before
MT Corpora
Machine translation models are trained on a parallel corpus, sometimes called a bitext, a text that appears in two (or more) languages. Some examples of parallel corpora are:
- The Europarl Corpus, extracted from the proceedings of the European Parliament
- The United Nations Parallel Corpus, extracted from official records and other parliamentary documents of the United Nations
- The OpenSubtitles Corpus, extracted from movie and TV subtitles
- The ParaCrawl Corpus, extracted from general web text
0
0
Tags
Data Science
Related
Application of autoregressive generation given a prefix: Machine translation
Statistical Machine Translation vs Neural Machine Translation
Backtranslation
MT Evaluation
MT Corpora
Assessing Translation Effectiveness for a Specific Use Case
A company is developing a translation service for legal documents, where preserving the precise meaning and complex sentence structure of the original text is the highest priority. The company has access to a massive parallel corpus of legal texts. Given these requirements, which approach would be more suitable and why?
Evaluating Machine Translation Quality
Unaligned Data in Sequence Learning