Learn Before
General Evaluation Benchmark
There is an essential issue for the NLP community that how can we evaluate PTMs in a comparable metric. Thus, large scale benchmark is necessary.
The General Language Understanding Evaluation (GLUE) benchmark is a collection of nine natural language understanding tasks, including single-sentence classification tasks, pairwise text classification tasks, text similarity task, and relevant ranking task. GLUE benchmark is well-designed for evaluating the robustness as well as generalization of models.
0
1
Tags
Data Science
Related
General Evaluation Benchmark
Named Entity Recognition
Text Regression with BERT Models
Single-Text Classification with BERT Models
Selecting the Appropriate NLP Task for a Business Need
Match each description of a natural language processing task with the most appropriate application name.
A company uses a fine-tuned pre-trained model to automatically process thousands of customer product reviews. When a review states, 'I am extremely disappointed with this purchase; it stopped working after just one use,' the system assigns it a 'Negative' label. Which primary application of a pre-trained model does this system exemplify?