SlovakBERT: Slovak Masked Language Model

Pikuliak, M., Grivalsky, S., Konopka, M., Blstak, M., Tamajka, M., Bachraty, V., Simko, M., Balazik, P.1, Trnka, M.1, Uhlarik, F.1

1 Gerulata Technologies

We introduce a new Slovak masked language model called SlovakBERT in this paper. It is the first Slovak-only transformers-based model trained on a sizeable corpus. We evaluate the model on several NLP tasks and achieve state-of-the-art results. We publish the masked language model, as well as the subsequently fine-tuned models for part-of-speech tagging, sentiment analysis and semantic textual similarity.

Cite: Pikuliak, M., Grivalsky, S., Konopka, M., Blstak, M., Tamajka, M., Bachraty, V., Simko, M., Balazik, P., Trnka, M., Uhlarik F. – SlovakBERT: Slovak Masked Language Model arXiv:2109.15254 [Preprint], 2021

Authors

Matúš Pikuliak
Junior Researcher
More
Štefan Grivalský
Research Assistant
More
Martin Konôpka
Research Engineer
More
Miroslav Blšták
Junior Researcher
More
Martin Tamajka
Junior Researcher
More
Viktor Bachratý
Research Consultant
More
Marián Šimko
Expert Researcher
More