Learn Before
Concept
Postprocessing
We apply a filtering step to remove sentences of greater than 50% punctuation. The data is then deduplicated, and we remove any sentence that appears in any validation or test dataset – even if it is associated with another language pair.
Finally, we apply length and language-specific filtering
0
1
Updated 2022-06-05
Tags
Science