Learn Before
Concept
Processing Idiom Data
The data (list of 225 English idioms) is fed into the pattern-matching tool, and translation pairs containing an idiom on the source side are extracted and annotated. Sentences with idioms occurring only once are discarded. Regular data is used for training, while the idiom data are evenly divided into idiom-train and idiom-test sets, i.e. half of each idiom’s sentence pairs go in the training set.
0
1
Updated 2023-02-17
Tags
Data Science