1Cademy - Problems with evaluation questions

Learn Before

Human evaluation in natural language generation (NLG)

Concept

Problems with evaluation questions

Even work on closely related topics prefers to use their own evaluation methods that are not based on any existing research. If the evaluation is based on existing research, the evaluation questions are not motivated in the earlier research either. This type of evaluation has become to be known as a symptom of the Great Misalignment Problem.

Apart from the evaluation questions not aligning with the model, a much larger problem related to evaluation questions can be identified. Most of the papers were not clear about the actual evaluation questions used, instead they listed the evaluated parameters as though human evaluation was like an automated one where one can just score abstract notions such as typicality or fluency accurately on a 5 point scale.

Updated 2022-07-30

Contributors are:

Mingyu Li

🏆 1

Who are from:

University of Michigan - Ann Arbor

🏆 1

References

Human Evaluation of Creative NLG Systems: An Interdisciplinary Survey on Recent Papers

Learn Before

Related