{"id":22367,"date":"2022-01-04T11:26:09","date_gmt":"2022-01-04T10:26:09","guid":{"rendered":"https:\/\/kinit.sk\/project\/slovakbert-the-first-public-slovak-language-model\/"},"modified":"2025-12-02T10:05:44","modified_gmt":"2025-12-02T09:05:44","slug":"slovakbert-the-first-public-slovak-language-model","status":"publish","type":"project","link":"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/","title":{"rendered":"SlovakBERT, the first public Slovak neural language model"},"content":{"rendered":"<div id=\"\" class=\"element core-paragraph\">\n<p>KInIT and Gerulata Technologies introduced SlovakBERT, a new language model for Slovak, which will help improve the automatic processing of texts written in Slovak.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>Such models were initially created mainly for English and subsequently for widely used languages, such as Chinese or French. Models for smaller languages such as Czech and Polish occurred later. Even multilingual models are available nowadays.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>The model, trained by our partner,&nbsp;<a href=\"https:\/\/www.gerulata.com\/\">Gerulata Technologies<\/a>, was consulted and scientifically evaluated by&nbsp;<a href=\"https:\/\/kinit.sk\/sk\/research\/natural-language-processing\/\">our NLP team<\/a>. SlovakBERT learned Slovak from about 20 GB of Slovak text collected from the Web. These data are a snapshot of what Slovak language looks like for the model.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-separator\">\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n<\/div>\n\n<div id=\"\" class=\"element core-columns\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\"><div id=\"\" class=\"element core-column\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\"><div id=\"\" class=\"element core-image is-style-default\">\n<figure class=\"wp-block-image is-style-default\"><a href=\"https:\/\/slovakbert.kinit.sk\"><img decoding=\"async\" width=\"300\" height=\"189\" data-src=\"https:\/\/kinit.sk\/wp-content\/uploads\/2021\/12\/Screenshot-2021-12-08-at-09.01.19-300x189.png\" alt=\"\" class=\"wp-image-12373 lazyload\" data-srcset=\"https:\/\/kinit.sk\/wp-content\/uploads\/2021\/12\/Screenshot-2021-12-08-at-09.01.19-300x189.png 300w, https:\/\/kinit.sk\/wp-content\/uploads\/2021\/12\/Screenshot-2021-12-08-at-09.01.19-1024x645.png 1024w, https:\/\/kinit.sk\/wp-content\/uploads\/2021\/12\/Screenshot-2021-12-08-at-09.01.19-320x200.png 320w, https:\/\/kinit.sk\/wp-content\/uploads\/2021\/12\/Screenshot-2021-12-08-at-09.01.19-768x484.png 768w, https:\/\/kinit.sk\/wp-content\/uploads\/2021\/12\/Screenshot-2021-12-08-at-09.01.19.png 1401w\" data-sizes=\"(max-width: 300px) 100vw, 300px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 300px; --smush-placeholder-aspect-ratio: 300\/189;\" \/><\/a><\/figure>\n<\/div><\/div>\n<\/div>\n\n<div id=\"\" class=\"element core-column\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\"><div id=\"\" class=\"element core-heading\">\n<h2 class=\"wp-block-heading\">Try it out!<\/h2>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>You can explore more about SlovakBERT and experiment how it works on. Just&nbsp;<a href=\"https:\/\/slovakbert.kinit.sk\">visit the dedicated website<\/a>&nbsp;and try it out for yourself.<\/p>\n<\/div><\/div>\n<\/div><\/div>\n<\/div>\n\n<div id=\"\" class=\"element core-separator\">\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>Training SlovakBERT required almost two weeks of calculations on a powerful computational server. By comparison, a computer with a mid-range graphics card might take years to finish the computations, a regular work laptop might take perhaps decades. SlovakBERT is now open to the world and&nbsp;<a href=\"https:\/\/github.com\/gerulata\/slovakbert\">accessible<\/a><sup>1<\/sup>&nbsp;to the NLP community. We believe that this step will improve the level of automated Slovak language processing for researchers, companies, but also for the general public.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>We described the results of the experimentation in the publicly available article&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2109.15254\">SlovakBERT: Slovak Masked Language Model<\/a><sup>2<\/sup>. The model proved to be so good that we are already involving it in projects with our partners from industry and it might soon appear in the first deployed applications, for example in the upcoming system for analysing sentiment of customer communication on public social network profiles.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-separator\">\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>&nbsp;<sup>1<\/sup> SlovakBERT at&nbsp;<a href=\"https:\/\/github.com\/gerulata\/slovakbert\">GitHUB<\/a><\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p><sup>2<\/sup><em>&nbsp;Mat\u00fa\u0161 Pikuliak,&nbsp;\u0160tefan Grivalsk\u00fd,&nbsp;Martin Kon\u00f4pka,&nbsp;Miroslav Bl\u0161t\u00e1k,&nbsp;Martin Tamajka,&nbsp;Viktor Bachrat\u00fd,&nbsp;Mari\u00e1n \u0160imko,&nbsp;Pavol Bal\u00e1\u017eik,&nbsp;Michal Trnka,&nbsp;Filip Uhl\u00e1rik. 2021.&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2109.15254\" target=\"_blank\" rel=\"noreferrer noopener\">SlovakBERT: Slovak Masked Language Model<\/a><\/em><\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-separator\">\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p><em>This research was realized thanks to the support of the Ministry of Education, Science, Research and Sports of the Slovak Republic.<\/em><\/p>\n<\/div>","protected":false},"featured_media":31170,"template":"","meta":{"_acf_changed":false,"footnotes":""},"categories":[76,148,414],"class_list":["post-22367","project","type-project","status-publish","has-post-thumbnail","hentry","category-natural-language-processing-sk","category-tools-sk","category-tools-and-models-sk"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>SlovakBERT, the first public Slovak neural language model - KInIT<\/title>\n<meta name=\"description\" content=\"KInIT and Gerulata Technologies introduced SlovakBERT, a new language model for Slovak, which will help improve the automatic processing of texts written in Slovak.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/\" \/>\n<meta property=\"og:locale\" content=\"sk_SK\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"SlovakBERT, the first public Slovak neural language model - KInIT\" \/>\n<meta property=\"og:description\" content=\"KInIT and Gerulata Technologies introduced SlovakBERT, a new language model for Slovak, which will help improve the automatic processing of texts written in Slovak.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/\" \/>\n<meta property=\"og:site_name\" content=\"KInIT\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-02T09:05:44+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/kinit.sk\/wp-content\/uploads\/2024\/02\/202401_projects_features_update_SlovakBert.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1201\" \/>\n\t<meta property=\"og:image:height\" content=\"629\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@kinit\" \/>\n<meta name=\"twitter:label1\" content=\"Predpokladan\u00fd \u010das \u010d\u00edtania\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 min\u00faty\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/projekt\\\/slovakbert-the-first-public-slovak-language-model\\\/\",\"url\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/projekt\\\/slovakbert-the-first-public-slovak-language-model\\\/\",\"name\":\"SlovakBERT, the first public Slovak neural language model - KInIT\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/projekt\\\/slovakbert-the-first-public-slovak-language-model\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/projekt\\\/slovakbert-the-first-public-slovak-language-model\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/kinit.sk\\\/wp-content\\\/uploads\\\/2024\\\/02\\\/202401_projects_features_update_SlovakBert.png\",\"datePublished\":\"2022-01-04T10:26:09+00:00\",\"dateModified\":\"2025-12-02T09:05:44+00:00\",\"description\":\"KInIT and Gerulata Technologies introduced SlovakBERT, a new language model for Slovak, which will help improve the automatic processing of texts written in Slovak.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/projekt\\\/slovakbert-the-first-public-slovak-language-model\\\/#breadcrumb\"},\"inLanguage\":\"sk-SK\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/kinit.sk\\\/sk\\\/projekt\\\/slovakbert-the-first-public-slovak-language-model\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"sk-SK\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/projekt\\\/slovakbert-the-first-public-slovak-language-model\\\/#primaryimage\",\"url\":\"https:\\\/\\\/kinit.sk\\\/wp-content\\\/uploads\\\/2024\\\/02\\\/202401_projects_features_update_SlovakBert.png\",\"contentUrl\":\"https:\\\/\\\/kinit.sk\\\/wp-content\\\/uploads\\\/2024\\\/02\\\/202401_projects_features_update_SlovakBert.png\",\"width\":1201,\"height\":629},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/projekt\\\/slovakbert-the-first-public-slovak-language-model\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Tools and models\",\"item\":\"https:\\\/\\\/kinit.sk\\\/category\\\/tools-and-models\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"SlovakBERT, the first public Slovak neural language model\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\",\"url\":\"https:\\\/\\\/kinit.sk\\\/\",\"name\":\"KInIT\",\"description\":\"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/kinit.sk\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"sk-SK\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"SlovakBERT, the first public Slovak neural language model - KInIT","description":"KInIT and Gerulata Technologies introduced SlovakBERT, a new language model for Slovak, which will help improve the automatic processing of texts written in Slovak.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/","og_locale":"sk_SK","og_type":"article","og_title":"SlovakBERT, the first public Slovak neural language model - KInIT","og_description":"KInIT and Gerulata Technologies introduced SlovakBERT, a new language model for Slovak, which will help improve the automatic processing of texts written in Slovak.","og_url":"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/","og_site_name":"KInIT","article_modified_time":"2025-12-02T09:05:44+00:00","og_image":[{"width":1201,"height":629,"url":"https:\/\/kinit.sk\/wp-content\/uploads\/2024\/02\/202401_projects_features_update_SlovakBert.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@kinit","twitter_misc":{"Predpokladan\u00fd \u010das \u010d\u00edtania":"2 min\u00faty"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/","url":"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/","name":"SlovakBERT, the first public Slovak neural language model - KInIT","isPartOf":{"@id":"https:\/\/kinit.sk\/#website"},"primaryImageOfPage":{"@id":"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/#primaryimage"},"image":{"@id":"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/#primaryimage"},"thumbnailUrl":"https:\/\/kinit.sk\/wp-content\/uploads\/2024\/02\/202401_projects_features_update_SlovakBert.png","datePublished":"2022-01-04T10:26:09+00:00","dateModified":"2025-12-02T09:05:44+00:00","description":"KInIT and Gerulata Technologies introduced SlovakBERT, a new language model for Slovak, which will help improve the automatic processing of texts written in Slovak.","breadcrumb":{"@id":"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/#breadcrumb"},"inLanguage":"sk-SK","potentialAction":[{"@type":"ReadAction","target":["https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/"]}]},{"@type":"ImageObject","inLanguage":"sk-SK","@id":"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/#primaryimage","url":"https:\/\/kinit.sk\/wp-content\/uploads\/2024\/02\/202401_projects_features_update_SlovakBert.png","contentUrl":"https:\/\/kinit.sk\/wp-content\/uploads\/2024\/02\/202401_projects_features_update_SlovakBert.png","width":1201,"height":629},{"@type":"BreadcrumbList","@id":"https:\/\/kinit.sk\/sk\/projekt\/slovakbert-the-first-public-slovak-language-model\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/kinit.sk\/sk\/"},{"@type":"ListItem","position":2,"name":"Tools and models","item":"https:\/\/kinit.sk\/category\/tools-and-models\/"},{"@type":"ListItem","position":3,"name":"SlovakBERT, the first public Slovak neural language model"}]},{"@type":"WebSite","@id":"https:\/\/kinit.sk\/#website","url":"https:\/\/kinit.sk\/","name":"KInIT","description":"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/kinit.sk\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"sk-SK"}]}},"_links":{"self":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/project\/22367","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/project"}],"about":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/types\/project"}],"version-history":[{"count":2,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/project\/22367\/revisions"}],"predecessor-version":[{"id":22474,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/project\/22367\/revisions\/22474"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/media\/31170"}],"wp:attachment":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/media?parent=22367"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/categories?post=22367"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}