Research Group

Natural Language Processing

On this page

Bio
Selected projects
Publications
Student Supervising

Miroslav Blšták

Research areas: natural language processing, computational linguistics, data processing, machine learning, computer-assisted learning, computer aided education

Position: AI Specialist

Email
Google Scholar
Research Gate
ORCiD
ResearcherID
LinkedIn

Miroslav has a background in natural language processing tasks, combining linguistic and machine learning approaches. He has built several tools, lexicons and knowledge sources used in Slovak language processing.

He has experience with several NLP tasks: automatic question generation, tokenization, lemmatization, stemming, part-of-speech tagging, named entity recognition, diacritic restoration and text reconstruction, term extraction, text simplification, text similarity, sentiment analysis, coreference resolution, temporal data extraction and legal document extraction.

As a teacher at the Slovak University of Technology, he supervised more than 20 Bachelor’s and Master’s theses in NLP. He also has experience in the processing and representation of structured and textual data, as well as in software engineering (software development and software architecture).

Selected Projects

ČSOB: Enabling a virtual assistant to interact with clients in a human way

During this cooperation with the ČSOB bank, we focused on research and development of methods for transformation of Slovak words between different forms. It is necessary for a smooth conversation…

Hopero.AI: European Digital Innovation Hub

The Slovak AI European Digital Innovation Hub is a nation-wide ecosystem with a clear focus on artificial intelligence aiming to support the digital transformation of Slovak companies in the European…

MIMEDIS: The impact of media discourse on attitudes towards migration

KInIT joins forces with the Comenius University in Bratislava to analyse the impact of media discourse on attitudes towards migration, migrants and migration policy in Slovakia. Over the last decades,…

Bencont: Streamlining the debt collection process

Debt collection is a crucial component of a sustainable financial system. The key challenge in this context lies in managing the substantial volume of debts that require attention. Artificial intelligence…

Other notable projects

Aspecta: Improving Public Procurement using Natural Language Processing

We collaborate with Aspecta to revolutionize the public procurement process through natural language processing (NLP) and language technologies. The multilabel classification and semantic similarity algorithms, as well as the use…

DisAI: Improving scientific excellence of KInIT in AI and language technologies to fight disinformation

We succeeded in Horizon Europe’s Twinning scheme and joined forces with leading research institutions in natural language processing and disinformation combating. German Research Center for Artificial Intelligence (DFKI), University of…

AI4Europe: The unified platform for boosting European AI academic and industrial research

KInIT joins forces with 23 prominent European institutions to build a better artificial intelligence platform supporting daily needs of academic and industrial AI researchers. In this project, we primarily focus…

Diacritics Restorer: Automatic diacritics restoration for Slovak Google Docs

One of the minor Natural Language Processing (NLP) tasks is the restoration of diacritics. This topic is not widely discussed as it is not an issue for major languages, such…

Seesame: Understanding social media conversations through AI

Social networks are a phenomenon of today with a significant impact on society. They can significantly shape public opinion. Discussions on social networks can be stimulating and bring new ideas,…

ČSOB: Enabling a virtual assistant to interact with clients in a human way

During this cooperation with the ČSOB bank, we focused on research and development of methods for transformation of Slovak words between different forms. It is necessary for a smooth conversation…

Selected Publications

Slovak Conceptual Dictionary

Blšták, M. – arXiv,

When solving tasks in the field of natural language processing, we sometimes need dictionary tools, such as lexicons, word form dictionaries or knowledge bases. However, the availability of dictionary data…

Automatic question generation based on sentence structure analysis using machine learning approach

Blstak, M., Rozinajova, V. – Natural Language Engineering, 2021

Blstak, M., Rozinajova, V. Abstract: Automatic question generation is one of the most challenging tasks of Natural Language Processing. It requires “bidirectional” language processing: firstly, the system has to understand…

Constructing Sentiment Lexicon with Game for Annotation Collection

Radosky, L., Blstak, M., – International Conference on Statistical Language and Speech Processing, 2021

Radosky, L., Blstak, M. Abstract: While research of sentiment analysis became very popular on the global scope, in Slovak language as an under-resourced language there are still many issues to…

Building an Agent for Factual Question Generation Task

Blstak, M., Rozinajova, V. – World Symposium on Digital Intelligence for Systems and Machines (DISA), 14 October 2018

With the boom of e-learning and online education systems, also question generation systems have become more interesting. Nowadays, it is relatively common and simple to use internet and web technologies…

SlovakBERT: Slovak Masked Language Model

Pikuliak, M., Grivalsky, S., Konopka, M., Blstak, M., Tamajka, M., Bachraty, V., Simko, M., Balazik, P., Trnka, M., Uhlarik F. – Findings of the Association for Computational Linguistics: EMNLP , 11 December 2022

Pikuliak, M., Grivalsky, S., Konopka, M., Blstak, M., Tamajka, M., Bachraty, V., Simko, M., Balazik, P.1, Trnka, M.1, Uhlarik, F.1 1 Gerulata Technologies We introduce a new Slovak masked language…

Automatic question generation based on analysis of sentence structure

Blstak, M., Rozinajova, V. – Conference on Text, Speech, and Dialogue, 2016

Selected Student Supervising

Master

Lukáš Radoský – Similarity of short texts in Slovak language. Defended 2021

Simona Zelenčíková – Coreference resolution in text. Defended 2021.

Martin Grega – Temporal sequence identification in the text. Defended 2021.

Richard Galeštok – Automated analysis of sentiment in Slovak texts. Defended 2020.

Lukáš Belaj – Named entity extraction from Slovak text. Defended 2019.

Lukáš Miškovský – Identification of coreference links in text. Defended 2017.

Bachelor

Michal Hunák – Detecting tricky plagiarism. Defended 2020.

Juraj Gemeľa – Building a Dictionaries by Games. Defended 2019.

Ondrej Harnúšek – Tool for determining similarity of texts. Defended 2019.

Lukáš Radoský – Lexicon construction using games. Defended 2019.

Tomáš Gábrš – Question generation from educational text. Defended 2017.

Martin Nemček – Educational texts processing. Defended 2016.