SlovakBERT: Slovak Masked Language Model

Pikuliak, M., Grivalsky, S., Konopka, M., Blstak, M., Tamajka, M., Bachraty, V., Simko, M., Balazik, P.1, Trnka, M.1, Uhlarik, F.1

1 Gerulata Technologies

We introduce a new Slovak masked language model called SlovakBERT in this paper. It is the first Slovak-only transformers-based model trained on a sizeable corpus. We evaluate the model on several NLP tasks and achieve state-of-the-art results. We publish the masked language model, as well as the subsequently fine-tuned models for part-of-speech tagging, sentiment analysis and semantic textual similarity.

Cite: Pikuliak, M., Grivalsky, S., Konopka, M., Blstak, M., Tamajka, M., Bachraty, V., Simko, M., Balazik, P., Trnka, M., Uhlarik F. – SlovakBERT: Slovak Masked Language Model arXiv:2109.15254 [Preprint], 2021

Authors

Matúš Pikuliak
Researcher
More
Štefan Grivalský
Research Assistant 11/2020 - 08/2021
More
Martin Konôpka
Research Engineer 10/2020 - 12/2021
More
Miroslav Blšták
Research Engineer
More
Martin Tamajka
Research Engineer
More
Viktor Bachratý
Research Consultant
More
Marián Šimko
Lead and Researcher
More