SlovakBERT: Slovak Masked Language Model

Pikuliak, M., Grivalsky, S., Konopka, M., Blstak, M., Tamajka, M., Bachraty, V., Simko, M., Balazik, P.1, Trnka, M.1, Uhlarik, F.1

1 Gerulata Technologies

We introduce a new Slovak masked language model called SlovakBERT in this paper. It is the first Slovak-only transformers-based model trained on a sizeable corpus. We evaluate the model on several NLP tasks and achieve state-of-the-art results. We publish the masked language model, as well as the subsequently fine-tuned models for part-of-speech tagging, sentiment analysis and semantic textual similarity.

Cite: Matúš Pikuliak, Štefan Grivalský, Martin Konôpka, Miroslav Blšták, Martin Tamajka, Viktor Bachratý, Marian Simko, Pavol Balážik, Michal Trnka, and Filip Uhlárik. 2022. SlovakBERT: Slovak Masked Language Model. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 7156–7168, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.

Authors

Matúš Pikuliak
Research Consultant 10/2022-01/2024
More
Štefan Grivalský
Researcher    11/2020-08/2021
More
Martin Konôpka
Researcher 10/2020-12/2021
More
Miroslav Blšták
Research Engineer
More
Martin Tamajka
Research Engineer
More
Viktor Bachratý
Research Consultant
More
Marián Šimko
Lead and Researcher
More