Learn Before
Concept

Steps of generic data mining pipeline

(1) a large corpus of text is preprocessed and divided into different languages,

(2) candidate pairs of aligned sentences are embedded and stored in a index,

(3) indexed sentences are compared to form potential pairs, (4) the resulting candidate pairs are filtered in post-processing

0

1

Updated 2022-06-05

Tags

Science