Slovak Conceptual Dictionary
A Comprehensive Resource for Automated Slovak Text Analysis
KInIT, in collaboration with Eduself.sk, has developed a conceptual dictionary designed to streamline and enhance Natural Language Processing (NLP) tasks for the Slovak language.
Slovak has long struggled with a lack of accessible and high-quality linguistic tools that would facilitate the processing of texts in our language. This hinders not only the ability to effectively process Slovak texts, but also the creation of other tools for text analysis that require high-quality dictionary data for their functionality. Whether it is common tasks such as extracting information from text (e.g. extracting people and locations mentioned in the text or identifying inappropriate or toxic words in comments), building your own dictionaries for these tasks is time-consuming and often necessary. However, the problem is not only the small range of available dictionaries for Slovak, but also their technological closure. The lack of communication interfaces for use by applications (APIs) prevents software from fully utilizing Slovak data, which makes the processing of our language slow in the face of global competition. Foreign dictionaries, in turn, do not take into account the specific features of Slovak, which makes their adaptation to Slovak complicated.
As a result, we decided to develop a custom conceptual dictionary for Slovak. It is accessible through both a web interface (https://pojmy.kinit.sk) and a machine-readable API for use in software applications. By providing information on concepts and their relationships, the dictionary offers powerful ways to improve text understanding. It is suitable for a wide range of NLP (Natural Language Processing) tasks. Beyond semantic analysis, like information extraction or text categorization, it also aids in lexical and contextual analysis. For example, identifying grammatical categories helps disambiguate words with multiple meanings. Additionally, the tool can recognize when different words refer to the same concept, such as nicknames or diminutives (e.g., Miroslav, Miro, or Mirko).
With approximately 145,000 concepts and 355,000 relationships, it is currently the most comprehensive tool of its kind for the Slovak language. It is also the only one that offers a machine-readable interface for developers.
The dictionary can be utilized to solve a wide range of (sub-)tasks in Slovak text processing:
- Location Extraction: Identification of cities, municipalities, countries, rivers, and various other geographical entities.
- Location-Based Relationships: Mapping relationships between cities and countries (e.g., Trnava – Slovakia), locations and inhabitants (Košice – Košičan), or adjective derivations (Váh – vážsky).
- Person Extraction: Includes first names and their alternative forms or diminutives (e.g., Miroslav – Mirko – Miro).
- Semantic Similarity: Identification of synonyms, diminutives, augmentatives, and other alternative labels for the same concept.
- Cross-Part-of-Speech Similarity: Linking related concepts across different parts of speech (e.g., the noun cooking linked to the verb cook, the participle cooking/cooked, and the adjective boiling/culinary).
- Hierarchy Identification: Mapping general-to-specific relationships (e.g., animal – dog) and supertype-subtype hierarchies (e.g., meter – millimeter or doctor – veterinarian).
- Word Sense Disambiguation: Distinguishing between polysemous words (e.g., “práčka” as a washing machine vs. a person who washes clothes).
- Antonyms and Negation: Identification of opposite meanings (north – south) and negation via prefixes (relevant – irrelevant).
- Number Normalization: Standardizing numerical data (e.g., the number 10 and its variants: desať, desiatka, or the Roman numeral X).
- Emoji Normalization: Mapping emojis to their corresponding textual concepts (e.g., 🍐 – pear).
- Role Distinction: Distinguishing gender roles (male student – female student), individual vs. collective roles (viewer – audience), and roles linked to locations or ideologies (Christian – Christianity).
Detailed documentation regarding categories, relationships, and API usage is available on the project website.
Blšták, M. (2025). Slovak Conceptual Dictionary. In arXiv:2512.00579 [cs.CL] https://arxiv.org/abs/2512.00579