Learn Before
Concept

Postprocessing

We apply a filtering step to remove sentences of greater than 50% punctuation. The data is then deduplicated, and we remove any sentence that appears in any validation or test dataset – even if it is associated with another language pair.

Finally, we apply length and language-specific filtering

0

1

Updated 2022-06-05

Tags

Science

Related