SlovakBERT: Slovak Masked Language Model
Pikuliak, M., Grivalsky, S., Konopka, M., Blstak, M., Tamajka, M., Bachraty, V., Simko, M., Balazik, P.1, Trnka, M.1, Uhlarik, F.1
1 Gerulata Technologies
We introduce a new Slovak masked language model called SlovakBERT in this paper. It is the first Slovak-only transformers-based model trained on a sizeable corpus. We evaluate the model on several NLP tasks and achieve state-of-the-art results. We publish the masked language model, as well as the subsequently fine-tuned models for part-of-speech tagging, sentiment analysis and semantic textual similarity.
Cite: Matúš Pikuliak, Štefan Grivalský, Martin Konôpka, Miroslav Blšták, Martin Tamajka, Viktor Bachratý, Marian Simko, Pavol Balážik, Michal Trnka, and Filip Uhlárik. 2022. SlovakBERT: Slovak Masked Language Model. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 7156–7168, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.