Home
Research
Web & User Data Processing
GEPERO: Generation of personalised content in the research of information quality

Project

Duration: 11/2024 - 06/2026

Funding agency: Recovery and Resilience Plan

Project type: Scientific project

Principal investigator: Ivan Srba

GEPERO: Generation of personalised content in the research of information quality

The GEPERO project is focused on research and development of new methods and models for the generation of personalised texts in multiple languages, dedicated to the research of information quality in the web and social media. It is primarily focused on generative AI in the form of large language models. GEPERO is conceived as an extension of the AI-CODE project (Horizon Europe), in which KInIT participates.

The primary goal of the GEPERO project is the research and development of new methods and models for the generation of personalised texts in multiple languages, dedicated to the research of information quality on the web and social media. To fulfil this goal, the GEPERO project specifically: 1) explores a potential of large language models to generate personalized multilingual synthetic data as well as to paraphrase/summarize the existing texts; 2) proposes and experimentally evaluates the methods and models for generation of personalized texts; 3) applies the proposed methods and models for creation of reusable and representative datasets, contributing to an increase of accuracy and robustness of the tools supporting the media professionals.

Compared to the ongoing Horizon Europe project AI-CODE, GEPERO will focus on personalised generation of multilingual textual content. A personalised text represents content specifically tailored to a given context. In GEPERO, we focus on two types of personalisation: 1) personalisation for specific target groups, identified by demographic and personality characteristics (e.g., high-school students); and 2) personalisation for specific target social-media platforms, identified by specific characteristics regarding length, form, or style of text (e.g., usage of hashtags and emoticons). The research on the generation of personalised textual content is completely absent in the state of the art. A multilingual text generation is represented by texts generated in multiple languages, especially in low-resource languages, to which Slovak belongs too. The multilingualism aspect suitably extends the existing state of the art, where most of the research is focused solely on English (or other high-resource, world-spread languages).

In the GEPERO projects, we find multiple challenges for implementation, such as the complexity of annotation of personalisation in generated texts, computational complexity of text generation, as well as ethical and moral questions regarding a potential misuse of research outputs. The solution will be based on sensitive usage of automated means involving artificial intelligence in individual project stages, as well as restrictions in publication of misuse-sensitive data, methods and models.

Project team

Ivan Srba

Researcher

Dominik Macko

Researcher

Aneta Žugecová

Volunteer 09/2024-01/2025

Andrew Pulver

Research Intern 4/2025-8/2025

Samuel Budai

Research Engineer 08/2024 – 05/2026

Matej Mosnár

Research Engineer

Adam Škurla

PhD Student

Jozef Barut

Research Intern

Katarína Házyová

Project Administrator

Marianna Palková

Communications Specialist

Funded by the EU NextGenerationEU through the Recovery and Resilience Plan for Slovakia under the project No. 09I01-03-V04-00068.

Related Publications

Zugecova, A., Macko, D., Srba, I., Moro, R., Kopal, J., Marcincinova, K., & Mesarcik, M. (2024). Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation. arXiv preprint arXiv:2412.13666.

Deliverables

D1.1 Interim report on Project Implementation and Achieved Results

D2.1 Report on communication, dissemination, and exploitation results

D3.1 Research report on the Potential of Large Language Models to Generate Personalized text

D3.2 Research Report on Methods for Generating Personalized Text

D3.3 Research report on augmented datasets for training models in information quality research