{"id":25754,"date":"2023-03-09T10:48:19","date_gmt":"2023-03-09T09:48:19","guid":{"rendered":"https:\/\/kinit.sk\/publication\/average-is-not-enough-caveats-of-multilingual-evaluation-2\/"},"modified":"2025-04-08T14:03:42","modified_gmt":"2025-04-08T12:03:42","slug":"average-is-not-enough-caveats-of-multilingual-evaluation-2","status":"publish","type":"publication","link":"https:\/\/kinit.sk\/sk\/publikacia\/average-is-not-enough-caveats-of-multilingual-evaluation-2\/","title":{"rendered":"Average Is Not Enough: Caveats of Multilingual Evaluation"},"content":{"rendered":"<div id=\"\" class=\"element core-paragraph\">\n<p><strong>Pikuliak, M., Simko, M.<\/strong><\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>This position paper discusses the problem of multilingual evaluation. Using simple statistics, such as average language performance, might inject linguistic biases in favor of dominant language families into evaluation methodology. We argue that a qualitative analysis informed by comparative linguistics is needed for multilingual results to detect this kind of bias. We show in our case study that results in published works can indeed be linguistically biased and we demonstrate that visualization based on URIEL typological database can detect it.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>Cite: Pikuliak, M., Simko, M. Average Is Not Enough: Caveats of Multilingual Evaluation. In Proceedings of the The 2nd Workshop on Multi-lingual Representation Learning (MRL), pages 125\u2013133, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics (2022). <a href=\"DOI: 10.18653\/v1\/2022.mrl-1.13.\">DOI: 10.18653\/v1\/2022.mrl-1.13.<\/a><\/p>\n<\/div>","protected":false},"featured_media":0,"template":"","meta":{"_acf_changed":false,"footnotes":""},"categories":[76,349],"class_list":["post-25754","publication","type-publication","status-publish","hentry","category-natural-language-processing-sk","category-2022-sk"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Average Is Not Enough: Caveats of Multilingual Evaluation - KInIT<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/kinit.sk\/sk\/publikacia\/average-is-not-enough-caveats-of-multilingual-evaluation-2\/\" \/>\n<meta property=\"og:locale\" content=\"sk_SK\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Average Is Not Enough: Caveats of Multilingual Evaluation - KInIT\" \/>\n<meta property=\"og:description\" content=\"Pikuliak, M., Simko, M. This position paper discusses the problem of multilingual evaluation. Using simple statistics, such as average language performance, might inject linguistic biases in favor of dominant language...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/kinit.sk\/sk\/publikacia\/average-is-not-enough-caveats-of-multilingual-evaluation-2\/\" \/>\n<meta property=\"og:site_name\" content=\"KInIT\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-08T12:03:42+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/kinit.sk\/wp-content\/uploads\/2021\/03\/KINIT_Sharepic.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@kinit\" \/>\n<meta name=\"twitter:label1\" content=\"Predpokladan\u00fd \u010das \u010d\u00edtania\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 min\u00fata\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/average-is-not-enough-caveats-of-multilingual-evaluation-2\\\/\",\"url\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/average-is-not-enough-caveats-of-multilingual-evaluation-2\\\/\",\"name\":\"Average Is Not Enough: Caveats of Multilingual Evaluation - KInIT\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\"},\"datePublished\":\"2023-03-09T09:48:19+00:00\",\"dateModified\":\"2025-04-08T12:03:42+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/average-is-not-enough-caveats-of-multilingual-evaluation-2\\\/#breadcrumb\"},\"inLanguage\":\"sk-SK\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/average-is-not-enough-caveats-of-multilingual-evaluation-2\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/average-is-not-enough-caveats-of-multilingual-evaluation-2\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/uvod\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Natural Language Processing\",\"item\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/category\\\/natural-language-processing-sk\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Average Is Not Enough: Caveats of Multilingual Evaluation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\",\"url\":\"https:\\\/\\\/kinit.sk\\\/\",\"name\":\"KInIT\",\"description\":\"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/kinit.sk\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"sk-SK\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Average Is Not Enough: Caveats of Multilingual Evaluation - KInIT","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/kinit.sk\/sk\/publikacia\/average-is-not-enough-caveats-of-multilingual-evaluation-2\/","og_locale":"sk_SK","og_type":"article","og_title":"Average Is Not Enough: Caveats of Multilingual Evaluation - KInIT","og_description":"Pikuliak, M., Simko, M. This position paper discusses the problem of multilingual evaluation. Using simple statistics, such as average language performance, might inject linguistic biases in favor of dominant language...","og_url":"https:\/\/kinit.sk\/sk\/publikacia\/average-is-not-enough-caveats-of-multilingual-evaluation-2\/","og_site_name":"KInIT","article_modified_time":"2025-04-08T12:03:42+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/kinit.sk\/wp-content\/uploads\/2021\/03\/KINIT_Sharepic.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@kinit","twitter_misc":{"Predpokladan\u00fd \u010das \u010d\u00edtania":"1 min\u00fata"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/kinit.sk\/sk\/publikacia\/average-is-not-enough-caveats-of-multilingual-evaluation-2\/","url":"https:\/\/kinit.sk\/sk\/publikacia\/average-is-not-enough-caveats-of-multilingual-evaluation-2\/","name":"Average Is Not Enough: Caveats of Multilingual Evaluation - KInIT","isPartOf":{"@id":"https:\/\/kinit.sk\/#website"},"datePublished":"2023-03-09T09:48:19+00:00","dateModified":"2025-04-08T12:03:42+00:00","breadcrumb":{"@id":"https:\/\/kinit.sk\/sk\/publikacia\/average-is-not-enough-caveats-of-multilingual-evaluation-2\/#breadcrumb"},"inLanguage":"sk-SK","potentialAction":[{"@type":"ReadAction","target":["https:\/\/kinit.sk\/sk\/publikacia\/average-is-not-enough-caveats-of-multilingual-evaluation-2\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/kinit.sk\/sk\/publikacia\/average-is-not-enough-caveats-of-multilingual-evaluation-2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/kinit.sk\/sk\/uvod\/"},{"@type":"ListItem","position":2,"name":"Natural Language Processing","item":"https:\/\/kinit.sk\/sk\/category\/natural-language-processing-sk\/"},{"@type":"ListItem","position":3,"name":"Average Is Not Enough: Caveats of Multilingual Evaluation"}]},{"@type":"WebSite","@id":"https:\/\/kinit.sk\/#website","url":"https:\/\/kinit.sk\/","name":"KInIT","description":"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/kinit.sk\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"sk-SK"}]}},"_links":{"self":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication\/25754","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication"}],"about":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/types\/publication"}],"version-history":[{"count":5,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication\/25754\/revisions"}],"predecessor-version":[{"id":35959,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication\/25754\/revisions\/35959"}],"wp:attachment":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/media?parent=25754"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/categories?post=25754"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}