{"id":38146,"date":"2025-09-21T18:53:15","date_gmt":"2025-09-21T16:53:15","guid":{"rendered":"https:\/\/kinit.sk\/publication\/clustering-malware-at-scale-a-first-full-benchmark-study\/"},"modified":"2026-04-23T15:47:59","modified_gmt":"2026-04-23T13:47:59","slug":"clustering-malware-at-scale-a-first-full-benchmark-study","status":"publish","type":"publication","link":"https:\/\/kinit.sk\/sk\/publikacia\/clustering-malware-at-scale-a-first-full-benchmark-study\/","title":{"rendered":"Clustering Malware at\u00a0Scale: A First Full-Benchmark Study"},"content":{"rendered":"<div id=\"\" class=\"element core-paragraph\">\n<p><strong><strong>Mocko, M., \u0160evcech, J., Chud\u00e1, D.<\/strong><\/strong><\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>Recent years have shown that malware attacks still happen with high frequency. Malware experts seek to categorize and classify incoming samples to confirm their trustworthiness or prove their maliciousness. One of the ways in which groups of malware samples can be identified is through malware clustering. Despite the efforts of the community, malware clustering which incorporates benign samples has been under-explored. Moreover, despite the availability of larger public benchmark malware datasets, malware clustering studies have avoided fully utilizing these datasets in their experiments, often resorting to small datasets with only a few families. Additionally, the current state-of-the-art solutions for malware clustering remain unclear. Our study evaluates malware clustering quality and establishes the state-of-the-art on Bodmas and Ember &#8211; two large public benchmark malware datasets. Ours is the first study of malware clustering performed on whole malware benchmark datasets. Additionally, we extend the malware clustering task by incorporating benign samples. Our results indicate that incorporating benign samples does not significantly degrade clustering quality. We find significant differences in the quality of the created clusters between Ember and Bodmas, as well as a private industry dataset. Contrary to popular opinion, our top clustering performers are K-Means and BIRCH, with DBSCAN and HAC falling behind.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph  margin-bottom-0\">\n<p class=\" margin-bottom-0\">Cite: Mocko, M., \u0160evcech, J., Chud\u00e1, D., Clustering Malware at\u00a0Scale: A First Full-Benchmark Study, Availability, Reliability and Security. ARES 2025. Lecture Notes in Computer Science, vol 15993. Springer, Cham. <a href=\"https:\/\/doi.org\/10.1007\/978-3-032-00627-1_12\">https:\/\/doi.org\/10.1007\/978-3-032-00627-1_12<\/a><\/p>\n<\/div>\n\n\n","protected":false},"featured_media":0,"template":"","meta":{"_acf_changed":true,"footnotes":""},"categories":[81,520],"class_list":["post-38146","publication","type-publication","status-publish","hentry","category-data-analytics-for-green-energy-sk","category-2025-sk"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Clustering Malware at\u00a0Scale: A First Full-Benchmark Study - KInIT<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/kinit.sk\/sk\/publikacia\/clustering-malware-at-scale-a-first-full-benchmark-study\/\" \/>\n<meta property=\"og:locale\" content=\"sk_SK\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Clustering Malware at\u00a0Scale: A First Full-Benchmark Study - KInIT\" \/>\n<meta property=\"og:description\" content=\"Mocko, M., \u0160evcech, J., Chud\u00e1, D. Recent years have shown that malware attacks still happen with high frequency. Malware experts seek to categorize and classify incoming samples to confirm their...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/kinit.sk\/sk\/publikacia\/clustering-malware-at-scale-a-first-full-benchmark-study\/\" \/>\n<meta property=\"og:site_name\" content=\"KInIT\" \/>\n<meta property=\"article:modified_time\" content=\"2026-04-23T13:47:59+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/kinit.sk\/wp-content\/uploads\/2021\/03\/KINIT_Sharepic.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@kinit\" \/>\n<meta name=\"twitter:label1\" content=\"Predpokladan\u00fd \u010das \u010d\u00edtania\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 min\u00faty\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/clustering-malware-at-scale-a-first-full-benchmark-study\\\/\",\"url\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/clustering-malware-at-scale-a-first-full-benchmark-study\\\/\",\"name\":\"Clustering Malware at\u00a0Scale: A First Full-Benchmark Study - KInIT\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\"},\"datePublished\":\"2025-09-21T16:53:15+00:00\",\"dateModified\":\"2026-04-23T13:47:59+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/clustering-malware-at-scale-a-first-full-benchmark-study\\\/#breadcrumb\"},\"inLanguage\":\"sk-SK\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/clustering-malware-at-scale-a-first-full-benchmark-study\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/publikacia\\\/clustering-malware-at-scale-a-first-full-benchmark-study\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/uvod\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Analytics for Green Energy\",\"item\":\"https:\\\/\\\/kinit.sk\\\/category\\\/data-analytics-for-green-energy\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Clustering Malware at\u00a0Scale: A First Full-Benchmark Study\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\",\"url\":\"https:\\\/\\\/kinit.sk\\\/\",\"name\":\"KInIT\",\"description\":\"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/kinit.sk\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"sk-SK\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Clustering Malware at\u00a0Scale: A First Full-Benchmark Study - KInIT","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/kinit.sk\/sk\/publikacia\/clustering-malware-at-scale-a-first-full-benchmark-study\/","og_locale":"sk_SK","og_type":"article","og_title":"Clustering Malware at\u00a0Scale: A First Full-Benchmark Study - KInIT","og_description":"Mocko, M., \u0160evcech, J., Chud\u00e1, D. Recent years have shown that malware attacks still happen with high frequency. Malware experts seek to categorize and classify incoming samples to confirm their...","og_url":"https:\/\/kinit.sk\/sk\/publikacia\/clustering-malware-at-scale-a-first-full-benchmark-study\/","og_site_name":"KInIT","article_modified_time":"2026-04-23T13:47:59+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/kinit.sk\/wp-content\/uploads\/2021\/03\/KINIT_Sharepic.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@kinit","twitter_misc":{"Predpokladan\u00fd \u010das \u010d\u00edtania":"2 min\u00faty"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/kinit.sk\/sk\/publikacia\/clustering-malware-at-scale-a-first-full-benchmark-study\/","url":"https:\/\/kinit.sk\/sk\/publikacia\/clustering-malware-at-scale-a-first-full-benchmark-study\/","name":"Clustering Malware at\u00a0Scale: A First Full-Benchmark Study - KInIT","isPartOf":{"@id":"https:\/\/kinit.sk\/#website"},"datePublished":"2025-09-21T16:53:15+00:00","dateModified":"2026-04-23T13:47:59+00:00","breadcrumb":{"@id":"https:\/\/kinit.sk\/sk\/publikacia\/clustering-malware-at-scale-a-first-full-benchmark-study\/#breadcrumb"},"inLanguage":"sk-SK","potentialAction":[{"@type":"ReadAction","target":["https:\/\/kinit.sk\/sk\/publikacia\/clustering-malware-at-scale-a-first-full-benchmark-study\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/kinit.sk\/sk\/publikacia\/clustering-malware-at-scale-a-first-full-benchmark-study\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/kinit.sk\/sk\/uvod\/"},{"@type":"ListItem","position":2,"name":"Data Analytics for Green Energy","item":"https:\/\/kinit.sk\/category\/data-analytics-for-green-energy\/"},{"@type":"ListItem","position":3,"name":"Clustering Malware at\u00a0Scale: A First Full-Benchmark Study"}]},{"@type":"WebSite","@id":"https:\/\/kinit.sk\/#website","url":"https:\/\/kinit.sk\/","name":"KInIT","description":"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/kinit.sk\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"sk-SK"}]}},"_links":{"self":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication\/38146","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication"}],"about":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/types\/publication"}],"version-history":[{"count":6,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication\/38146\/revisions"}],"predecessor-version":[{"id":42054,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/publication\/38146\/revisions\/42054"}],"wp:attachment":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/media?parent=38146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/categories?post=38146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}