{"id":41345,"date":"2026-05-14T14:32:48","date_gmt":"2026-05-14T12:32:48","guid":{"rendered":"https:\/\/kinit.sk\/?post_type=publication&#038;p=41345"},"modified":"2026-05-14T14:32:52","modified_gmt":"2026-05-14T12:32:52","slug":"better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification","status":"publish","type":"publication","link":"https:\/\/kinit.sk\/sk\/publikacia\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\/","title":{"rendered":"Better as Generators Than Classifiers: Leveraging\u00a0LLMs and Synthetic Data for Low-Resource Multilingual Classification"},"content":{"rendered":"<div id=\"\" class=\"element core-paragraph\">\n<p>Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, making them promising tools in both high- and low-resource languages. One particularly valuable use case is generating synthetic samples that can be used to train smaller models in low-resource scenarios where human-labelled data is scarce. In this work, we investigate whether these synthetic data generation capabilities can serve as a form of distillation, producing smaller models that perform on par with or even better than massive LLMs across languages and tasks. To this end, we use a state-of-the-art multilingual LLM to generate synthetic datasets covering 11 languages and 4 classification tasks. These datasets are then used to train smaller models via fine-tuning or instruction tuning, or as synthetic in-context examples for compact LLMs. Our experiments show that even small amounts of synthetic data enable smaller models to outperform the large generator itself, particularly in low-resource languages. Overall, the results suggest that LLMs are best utilised as generators (teachers) rather than classifiers, producing data that empowers smaller and more efficient multilingual models.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p><strong>Cite<\/strong>: <em>Branislav Pecher, Jan Cegin, Robert Belanec, Ivan Srba, Jakub Simko, and Maria Bielikova. 2026.\u00a0<a href=\"https:\/\/aclanthology.org\/2026.findings-eacl.148\/\">Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification<\/a>. In\u00a0<em>Findings of the Association for Computational Linguistics: EACL 2026<\/em>, pages 2840\u20132857, Rabat, Morocco. Association for Computational Linguistics.<\/em><\/p>\n<\/div>","protected":false},"featured_media":0,"template":"","meta":{"_acf_changed":false,"footnotes":""},"categories":[78,236,542],"class_list":["post-41345","publication","type-publication","status-publish","hentry","category-web-user-data-processing-sk","category-bielikovam-sk","category-2026-sk"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Better as Generators Than Classifiers: Leveraging\u00a0LLMs and Synthetic Data for Low-Resource Multilingual Classification - KInIT<\/title>\n<meta name=\"description\" content=\"Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, making them promising tools in both high- and...\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/kinit.sk\/sk\/publikacia\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\/\" \/>\n<meta property=\"og:locale\" content=\"sk_SK\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Better as Generators Than Classifiers: Leveraging\u00a0LLMs and Synthetic Data for Low-Resource Multilingual Classification - KInIT\" \/>\n<meta property=\"og:description\" content=\"Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, making them promising tools in both high- and...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/kinit.sk\/sk\/publikacia\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\/\" \/>\n<meta property=\"og:site_name\" content=\"KInIT\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-14T12:32:52+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/kinit.sk\/wp-content\/uploads\/2021\/03\/KINIT_Sharepic.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@kinit\" \/>\n<meta name=\"twitter:label1\" content=\"Predpokladan\u00fd \u010das \u010d\u00edtania\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 min\u00fata\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\\\/\",\"url\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\\\/\",\"name\":\"Better as Generators Than Classifiers: Leveraging\u00a0LLMs and Synthetic Data for Low-Resource Multilingual Classification - KInIT\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\"},\"datePublished\":\"2026-05-14T12:32:48+00:00\",\"dateModified\":\"2026-05-14T12:32:52+00:00\",\"description\":\"Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, making them promising tools in both high- and...\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\\\/#breadcrumb\"},\"inLanguage\":\"sk-SK\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"BielikovaM\",\"item\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/category\\\/bielikovam-sk\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Better as Generators Than Classifiers: Leveraging\u00a0LLMs and Synthetic Data for Low-Resource Multilingual Classification\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\",\"url\":\"https:\\\/\\\/kinit.sk\\\/\",\"name\":\"KInIT\",\"description\":\"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/kinit.sk\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"sk-SK\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Better as Generators Than Classifiers: Leveraging\u00a0LLMs and Synthetic Data for Low-Resource Multilingual Classification - KInIT","description":"Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, making them promising tools in both high- and...","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/kinit.sk\/sk\/publikacia\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\/","og_locale":"sk_SK","og_type":"article","og_title":"Better as Generators Than Classifiers: Leveraging\u00a0LLMs and Synthetic Data for Low-Resource Multilingual Classification - KInIT","og_description":"Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, making them promising tools in both high- and...","og_url":"https:\/\/kinit.sk\/sk\/publikacia\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\/","og_site_name":"KInIT","article_modified_time":"2026-05-14T12:32:52+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/kinit.sk\/wp-content\/uploads\/2021\/03\/KINIT_Sharepic.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@kinit","twitter_misc":{"Predpokladan\u00fd \u010das \u010d\u00edtania":"1 min\u00fata"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/kinit.sk\/sk\/publikacia\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\/","url":"https:\/\/kinit.sk\/sk\/publikacia\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\/","name":"Better as Generators Than Classifiers: Leveraging\u00a0LLMs and Synthetic Data for Low-Resource Multilingual Classification - KInIT","isPartOf":{"@id":"https:\/\/kinit.sk\/#website"},"datePublished":"2026-05-14T12:32:48+00:00","dateModified":"2026-05-14T12:32:52+00:00","description":"Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, making them promising tools in both high- and...","breadcrumb":{"@id":"https:\/\/kinit.sk\/sk\/publikacia\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\/#breadcrumb"},"inLanguage":"sk-SK","potentialAction":[{"@type":"ReadAction","target":["https:\/\/kinit.sk\/sk\/publikacia\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/kinit.sk\/sk\/publikacia\/better-as-generators-than-classifiers-leveraging-llms-and-synthetic-data-for-low-resource-multilingual-classification\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/kinit.sk\/sk\/"},{"@type":"ListItem","position":2,"name":"BielikovaM","item":"https:\/\/kinit.sk\/sk\/category\/bielikovam-sk\/"},{"@type":"ListItem","position":3,"name":"Better as Generators Than Classifiers: Leveraging\u00a0LLMs and Synthetic Data for Low-Resource Multilingual Classification"}]},{"@type":"WebSite","@id":"https:\/\/kinit.sk\/#website","url":"https:\/\/kinit.sk\/","name":"KInIT","description":"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/kinit.sk\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"sk-SK"}]}},"_links":{"self":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication\/41345","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication"}],"about":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/types\/publication"}],"version-history":[{"count":1,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication\/41345\/revisions"}],"predecessor-version":[{"id":42442,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication\/41345\/revisions\/42442"}],"wp:attachment":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/media?parent=41345"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/categories?post=41345"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}