GEPERO: Generation of personalised content in the research of information quality
The GEPERO project is focused on research and development of new methods and models for the generation of personalised texts in multiple languages, dedicated to the research of information quality in the web and social media. It is primarily focused on generative AI in the form of large language models. GEPERO is conceived as an extension of the AI-CODE project (Horizon Europe), in which KInIT participates.
The primary goal of the GEPERO project is the research and development of new methods and models for the generation of personalised texts in multiple languages, dedicated to the research of information quality on the web and social media. To fulfil this goal, the GEPERO project specifically: 1) explores a potential of large language models to generate personalized multilingual synthetic data as well as to paraphrase/summarize the existing texts; 2) proposes and experimentally evaluates the methods and models for generation of personalized texts; 3) applies the proposed methods and models for creation of reusable and representative datasets, contributing to an increase of accuracy and robustness of the tools supporting the media professionals.
Compared to the ongoing Horizon Europe project AI-CODE, GEPERO will focus on personalised generation of multilingual textual content. A personalised text represents content specifically tailored to a given context. In GEPERO, we focus on two types of personalisation: 1) personalisation for specific target groups, identified by demographic and personality characteristics (e.g., high-school students); and 2) personalisation for specific target social-media platforms, identified by specific characteristics regarding length, form, or style of text (e.g., usage of hashtags and emoticons). The research on the generation of personalised textual content is completely absent in the state of the art. A multilingual text generation is represented by texts generated in multiple languages, especially in low-resource languages, to which Slovak belongs too. The multilingualism aspect suitably extends the existing state of the art, where most of the research is focused solely on English (or other high-resource, world-spread languages).
In the GEPERO projects, we find multiple challenges for implementation, such as the complexity of annotation of personalisation in generated texts, computational complexity of text generation, as well as ethical and moral questions regarding a potential misuse of research outputs. The solution will be based on sensitive usage of automated means involving artificial intelligence in individual project stages, as well as restrictions in publication of misuse-sensitive data, methods and models.
Project team
Funded by the EU NextGenerationEU through the Recovery and Resilience Plan for Slovakia under the project No. 09I01-03-V04-00068.
Related Publications
- Zugecova, A., Macko, D., Srba, I., Moro, R., Kopal, J., Marcincinova, K., & Mesarcik, M. (2024). Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation. arXiv preprint arXiv:2412.13666.