Learn Before
Concept

MT Corpora

Machine translation models are trained on a parallel corpus, sometimes called a bitext, a text that appears in two (or more) languages. Some examples of parallel corpora are:

  • The Europarl Corpus, extracted from the proceedings of the European Parliament
  • The United Nations Parallel Corpus, extracted from official records and other parliamentary documents of the United Nations
  • The OpenSubtitles Corpus, extracted from movie and TV subtitles
  • The ParaCrawl Corpus, extracted from general web text

0

0

Updated 2021-12-05

Tags

Data Science