Learn Before
Concept

General Evaluation Benchmark

There is an essential issue for the NLP community that how can we evaluate PTMs in a comparable metric. Thus, large scale benchmark is necessary.

The General Language Understanding Evaluation (GLUE) benchmark is a collection of nine natural language understanding tasks, including single-sentence classification tasks, pairwise text classification tasks, text similarity task, and relevant ranking task. GLUE benchmark is well-designed for evaluating the robustness as well as generalization of models.

0

1

Updated 2022-05-27

Tags

Data Science