Home
Research
Web & User Data Processing
RobIndAI: Robust indication of AI-generated disinformation content in multilingual online space

Project

Duration: 11/2024 - 06/2026

Funding agency: Recovery and Resilience Plan

Project type: Scientific project

Principal investigator: Jakub Šimko

RobIndAI: Robust indication of AI-generated disinformation content in multilingual online space

The RobIndAI project will fight against the misuse of AI for disinformation text generation by increasing the robustness of methods for machine-generated text detection. RobIndAI is focused on multilingual content, especially languages of the Central European region, targeting news articles and social-media content. RobIndAI is conceived as an extension of the VIGILANT project (Horizon Europe), in which KInIT participates.

The goal of the RobIndAI project is to research artificial intelligence methods and models for increasing the robustness of indicating disinformation content (from the web and social media), especially focused on the detection of machine-generated texts. Since the modern language models are capable of generating high-quality multilingual text, indistinguishable by a human, the concerns about misuse of such technology are growing (e.g., international disinformation campaigns). The reliable detection of AI-generated text, differentiating it from authentic human-written content, is a key required indicator.

In the RobIndAI project, we will use fundamentally multilingual methods and models for text processing, specifically tailored for the Central European information space. Within the project, we will create a benchmark focused on this region, comparing the performance of existing methods for AI-generated text detection. This benchmark study will also focus on the robustness of such methods against the existing attacks and obfuscation methods to avoid detection. Compared to the ongoing Horizon Europe project VIGILANT, RobIndAI will introduce more advanced text processing methods (primarily based on the latest large language models), regional and content-domain specificity of methods (along with a new dataset focused on our region), a deeper comparison of various architectural alternatives for detection (a dedicated model for each language vs. a single multilingual model), and robustness against new sophisticated attacks.

The project is based on the assumption that AI-generated texts have characteristic patterns, which can be identified by analytic methods and artificial intelligence itself. Regarding disinformation, the project considers machine-generated text to be a positive indicator of mass-spread disinformation in online space.

RobIndAI uses modern machine learning methods, natural language processing, and data analysis to address the problem of detecting machine-generated text in online media. A key factor is the acquisition of high-quality training data and a diverse dataset (augmented by paraphrased texts) to ensure the effectiveness of the models in the real world.

Project team

Jakub Šimko

Lead and Researcher

Dominik Macko

Researcher

Jakub Kopál

Research Engineer

Michal Spiegel

Volunteer

Adam Škurla

PhD Student

Katarína Házyová

Project Administrator

Marianna Palková

Communications Specialist

Adrián Gavorník

Ethics Specialist

Samuel Budai

Research Engineer 08/2024 – 05/2026

Deliverables

V1.1 Interim report on the implementation and achieved results of the project

V2.1 Report on communication, dissemination, and exploitation results

V3.1 Research report on models and methods for robust detection of machine-generated text

D3.2 Research report on optimized architecture of machine-generated text detection system

Attachments

Paper accepted at PAN@CLEF 2025 Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection

Article accepted for Computer magazine Beyond speculation Measuring the growing presence of LLM-generated texts in multilingual disinformation

Article accepted for EMNLP 2025 A Rigorous Evaluation of LLM Data Generation Strategies for Low-Resource Languages

Article submitted for AAAI 2026 Authorship Attribution in Multilingual Machine-Generated Texts

Funded by the EU NextGenerationEU through the Recovery and Resilience Plan for Slovakia under the project No. 09I01-03-V04-00059.

Related Publications

Macko, D., Moro, R., & Srba, I. (2025). Increasing the Robustness of the Fine-tuned Multilingual Machine-Generated Text Detectors. arXiv preprint arXiv:2503.15128.

Macko, D., Ramakrishnan, A. A., Lucas, J. S., Moro, R., Srba, I., Uchendu, A., & Lee, D. (2025). Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation. arXiv preprint arXiv:2503.23242.