What's
Slovak NLP Community Meeting #3
Program
12:30 Arrival, joint lunch
13:20 Opening
13:30 Invited talk: Daniel Hládek: Training and Evaluation of Slovak Vector Models Using a Question–Answer Dataset
14:15 Roundtable (team updates + discussion)
I. Team updates (2025 overview) – approx. 40–50 min
II. Discussion
16:15 Poster session
17:30 Closing and social program
Supercomputer Perun tour, dinner at 19:00, etc.
Coffee breaks will be scheduled during the program as agreed on-site.
Training and Evaluation of Slovak Vector Models Using a Question–Answer Dataset
In this talk, I will introduce Retrieval SkQuAD, a new benchmark designed to support research in information retrieval for the Slovak language. While extensive evaluation datasets exist for English and other major languages, Slovak has so far lacked a comparable resource. Retrieval SkQuAD addresses this gap with 19,000 manually annotated answers to 1,134 questions, while each answer is accompanied by relevance assessments and information about the usefulness of documents in generating responses. The benchmark is integrated into the BEIR and MTEB frameworks, ensuring compatibility with established standards for multilingual evaluation. The talk will also cover how we fine-tuned several sentence-transformer and BERT models on this dataset, using adversarial questions as hard negative examples to improve model robustness. Finally, potential directions for future research in training and evaluating Slovak information retrieval models will be outlined.

Summary
The third meeting of the Slovak NLP community was hosted by Technical University of Košice, in the University Library. The program began with an invited talk by Daniel Hládek from KEMT FEI TUKE and continued with updates from nine participating research teams. During the open roundtable discussion, information was shared about Slovakia’s participation in the CLARIN consortium, and the idea of a joint language AI hub was introduced. The discussion also focused on priorities for advancing the development of language resources and strengthening collaboration in these initiatives. A new feature of this meeting was the poster session, where researchers presented the results of their projects, with a particular emphasis on PhD students. The program concluded with a tour of the Perun supercomputer.


Partners of the event



