The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

doi:10.60692/s7zps-d3a13

Published January 1, 2021 | Version v1

Publication Open

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

1. Google (United States)
2. Luleå University of Technology
3. Indraprastha Institute of Information Technology Delhi
4. Indian Institute of Technology Delhi
5. Indian Institute of Technology Hyderabad
6. University of Lagos
7. Stanford University
8. Carnegie Mellon University
9. Heriot-Watt University Malaysia
10. University of Edinburgh
11. University of Virginia
12. Cornell University
13. Charles University
14. Technical University of Munich
15. Michigan United
16. University of Michigan–Ann Arbor
17. Johns Hopkins University
18. German Research Centre for Artificial Intelligence
19. University of Kaiserslautern
20. University of Waterloo
21. Columbia University
22. Atlanta Technical College
23. Georgia Institute of Technology
24. University of North Carolina at Charlotte
25. University of California, San Diego
26. Instituto de Telecomunicações
27. University of Washington
28. Pompeu Fabra University
29. Tilburg University
30. Microsoft (United States)
31. Massachusetts Institute of Technology
32. Kwame Nkrumah University
33. Kwame Nkrumah University of Science and Technology
34. National Institute of Technology Karnataka
35. The University of Texas at Austin
36. New York University
37. Université de Lorraine
38. Universidade de São Paulo
39. Intelligent Systems Research (United States)
40. Samsung (United States)
41. Harvard University Press

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics.Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards.Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with wellestablished, but flawed, metrics.This disconnect makes it challenging to identify the limitations of current models and opportunities for progress.Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested.Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models.This paper serves as the description of the data for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.

Translated Descriptions

This is an automatic machine translation with an accuracy of 90-95%

Translated Description (Arabic)

نقدم GEM، وهو معيار حي لتوليد اللغة الطبيعية (NLG)، وتقييمه، ومقاييسه. يعتمد قياس التقدم في NLG على نظام بيئي متطور باستمرار من المقاييس الآلية ومجموعات البيانات ومعايير التقييم البشرية. نظرًا لهذا الهدف المتحرك، غالبًا ما لا تزال النماذج الجديدة تقيم على مجموعات متباينة تتمحور حول اللغة الإنجليزية مع مقاييس راسخة ولكنها معيبة. هذا الفصل يجعل من الصعب تحديد قيود النماذج الحالية وفرص التقدم. لمعالجة هذا القيد، يوفر GEM بيئة يمكن فيها تطبيق النماذج بسهولة على مجموعة واسعة من المهام والتي يمكن فيها اختبار استراتيجيات التقييم. ستساعد التحديثات المنتظمة للمعيار في أن تصبح أبحاث NLG أكثر تعددًا للغات وتطور التحدي جنبًا إلى جنب مع النماذج. تعمل هذه الورقة بمثابة وصف للبيانات التي ننظم من أجلها مهمة مشتركة في ورشة عمل ACL 2021 والتي ندعو مجتمع NLG بأكمله للمشاركة فيها.

Translated Description (French)

Nous introduisons GEM, une référence vivante pour la génération du langage naturel (NLG), son évaluation et ses métriques. La mesure des progrès dans le NLG repose sur un écosystème en constante évolution de métriques automatisées, d'ensembles de données et de normes d'évaluation humaines. En raison de cette cible mobile, de nouveaux modèles évaluent souvent encore sur des corpus anglo-centriques divergents avec des métriques bien établies, mais imparfaites. Cette déconnexion rend difficile l'identification des limites des modèles actuels et des opportunités de progrès. Pour répondre à cette limitation, GEM fournit un environnement dans lequel les modèles peuvent facilement être appliqués à un large éventail de tâches et dans lequel les stratégies d'évaluation peuvent être testées. Des mises à jour régulières de la référence aideront la recherche NLG à devenir plus multilingue et à faire évoluer le défi aux côtés des modèles. Ce document sert de description des données pour lesquelles nous organisons une tâche partagée lors de notre atelier ACL 2021 et auxquelles nous invitons l'ensemble de la communauté NLG à participer.

Translated Description (Spanish)

Presentamos GEM, un punto de referencia vivo para la generación de lenguaje natural (NLG), su evaluación y métricas. La medición del progreso en NLG se basa en un ecosistema en constante evolución de métricas automatizadas, conjuntos de datos y estándares de evaluación humana. Debido a este objetivo móvil, los nuevos modelos a menudo todavía evalúan cuerpos anglocéntricos divergentes con métricas bien establecidas, pero defectuosas. Esta desconexión hace que sea difícil identificar las limitaciones de los modelos actuales y las oportunidades de progreso. Al abordar esta limitación, GEM proporciona un entorno en el que los modelos se pueden aplicar fácilmente a un amplio conjunto de tareas y en el que se pueden probar estrategias de evaluación. Las actualizaciones regulares del punto de referencia ayudarán a que la investigación de NLG se vuelva más multilingüe y evolucione el desafío junto con los modelos. Este documento sirve como la descripción de los datos para los que estamos organizando una tarea compartida en nuestro Taller ACL 2021 y a los que invitamos a toda la comunidad de NLG a participar.

Files

2021.gem-1.10.pdf.pdf

Files (1.3 MB)

Please wait a few minutes before your translated files are ready Note: Some files might be protected thus translations might not work.

Name	Size	Download all
2021.gem-1.10.pdf.pdf md5:22259ed01d96d44386deb543a102cd39	1.3 MB	Preview Download

Additional details

Translated title (Arabic): معيار GEM: توليد اللغة الطبيعية وتقييمها ومقاييسها
Translated title (French): Le benchmark GEM : la génération de langage naturel, son évaluation et ses métriques
Translated title (Spanish): El punto de referencia de GEM: generación de lenguaje natural, su evaluación y métricas

Other: https://openalex.org/W3186655327
DOI: 10.18653/v1/2021.gem-1.10

Is Global South Knowledge: Yes
Country: Nigeria

https://openalex.org/W2047046780
https://openalex.org/W2101105183
https://openalex.org/W2251180427
https://openalex.org/W2508316494
https://openalex.org/W2511538013
https://openalex.org/W2518570122
https://openalex.org/W2534253848
https://openalex.org/W2604799547
https://openalex.org/W2613898922
https://openalex.org/W2739046565
https://openalex.org/W2786660442
https://openalex.org/W2806532810
https://openalex.org/W2888482885
https://openalex.org/W2903188467
https://openalex.org/W2911227954
https://openalex.org/W2914397182
https://openalex.org/W2915756181
https://openalex.org/W2916548775
https://openalex.org/W2945760033
https://openalex.org/W2950397305
https://openalex.org/W2950681488
https://openalex.org/W2952523122
https://openalex.org/W2953251345
https://openalex.org/W2953280096
https://openalex.org/W2962717047
https://openalex.org/W2962854379
https://openalex.org/W2962996600
https://openalex.org/W2963091658
https://openalex.org/W2963096510
https://openalex.org/W2963206148
https://openalex.org/W2963341956
https://openalex.org/W2963466651
https://openalex.org/W2963607157
https://openalex.org/W2963825865
https://openalex.org/W2963912046
https://openalex.org/W2963926728
https://openalex.org/W2963929190
https://openalex.org/W2963976294
https://openalex.org/W2964223283
https://openalex.org/W2964237709
https://openalex.org/W2964321064
https://openalex.org/W2970791445
https://openalex.org/W2970892365
https://openalex.org/W2987188351
https://openalex.org/W2988222679
https://openalex.org/W2994963504
https://openalex.org/W2996176596
https://openalex.org/W3034188538
https://openalex.org/W3034383590
https://openalex.org/W3034999214
https://openalex.org/W3035008906
https://openalex.org/W3035032094
https://openalex.org/W3035252911
https://openalex.org/W3035267217
https://openalex.org/W3035408261
https://openalex.org/W3035497479
https://openalex.org/W3035507081
https://openalex.org/W3045703328
https://openalex.org/W3098495697
https://openalex.org/W3098886914
https://openalex.org/W3098998028
https://openalex.org/W3099766584
https://openalex.org/W3099771192
https://openalex.org/W3100292568
https://openalex.org/W3102187933
https://openalex.org/W3102690631
https://openalex.org/W3103450644
https://openalex.org/W3105424285
https://openalex.org/W3105830849
https://openalex.org/W3106445907
https://openalex.org/W3117367489
https://openalex.org/W4252316627

	All versions	This version
Views	2	2
Downloads	1	1
Data volume	1.3 MB	1.3 MB

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Translated Descriptions

Translated Description (Arabic)

Translated Description (French)

Translated Description (Spanish)

Files

2021.gem-1.10.pdf.pdf

Files (1.3 MB)

Additional details

Additional titles

Identifiers

Related works

GreSIS Basics Section

References

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Creators

Description

Translated Descriptions

Translated Description (Arabic)

Translated Description (French)

Translated Description (Spanish)

Files

2021.gem-1.10.pdf.pdf

Files (1.3 MB)

Additional details

Additional titles

Identifiers

Related works

GreSIS Basics Section

References