What's
PhD Themes 2024: Addressing Limitations of Large Language Models
Supervising team: Michal Gregor (supervisor, KInIT), Jana Kosecka (George Mason University)
Keywords: large language models, deep learning, machine learning, multi-modal, in-context learning, long context, fine-tuning
Large language models (LLMs) are powerful tools that can support a wide range of downstream tasks. They can be used e.g. in advanced conversational interfaces or in various tasks that involve retrieval, classification, generation, and more. Such tasks can be approached through zero-shot or few-shot in-context learning, or by fine-tuning the LLM on larger datasets (typically using parameter-efficient techniques to reduce memory and storage requirements).
Despite their unprecedented performance in many tasks, LLMs suffer from several significant limitations that currently hinder their safe and widespread use in many domains. These limitations include tendencies to generate responses not supported by the training corpus or input context (hallucination), difficulties in handling extremely long contexts (e.g., entire books), and limited ability to utilize other data modalities such as vision, where state-of-the-art models generally struggle to recognize fine-grained concepts.
The goal of this research is to explore such limitations, and – after selecting one or two of them to focus on – to propose new strategies to mitigate them. These strategies may include e.g.:
- Shifting the generation mode closer to retrieval-style approaches and non-parametric language models;
- Augmenting models with self-correction mechanisms and self-evaluation pipelines;
- Efficiently supporting extended contexts;
- Fuller utilization of multimodality, especially in the context of vision-language models; explainability analysis of models and the design of new training mechanisms supporting the ability to recognize fine-grained visual concepts as well;
- Introducing novel fine-tuning techniques;
- Improving and further utilizing the reasoning abilities of LLMs.
Relevant publications:
- Srba, I., Pecher, B., Tomlein, M., Moro, R., Stefancova, E., Simko, J. and Bielikova, M., 2022, July. Monant medical misinformation dataset: Mapping articles to fact-checked claims. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2949-2959). https://dl.acm.org/doi/10.1145/3477495.3531726
- Pikuliak, M., Srba, I., Moro, R., Hromadka, T., Smolen, T., Melisek, M., Vykopal, I., Simko, J., Podrouzek, J. and Bielikova, M., 2023. Multilingual Previously Fact-Checked Claim Retrieval. https://arxiv.org/abs/2305.07991
The application domain can be for example support for fact-checking and disinformation combatting, where the factuality of LLM outputs is absolutely critical.
The research will be performed at the Kempelen Institute of Intelligent Technologies (KInIT, https://kinit.sk) in Bratislava in cooperation with researchers from highly respected research units. A combined (external) form of study and full employment at KInIT is expected.
Supervising team
Michal Gregor is an expert researcher at KInIT. He focuses especially on artificial neural networks and deep learning, on reinforcement learning, and more recently on multi-modal learning and learning that involves language supervision. Michal also has experience in other areas of AI such as metaheuristic optimization methods, representation of uncertain knowledge, probabilistic models and more.
Jana Kosecka is a Professor at the George Mason University. She is interested in computational models of vision systems, acquisition of static and dynamic models of environments by means of visual sensing, high-level semantic scene understanding and human-computer interaction. She held visiting positions at UC Berkeley, Stanford University, Google and Nokia Research, and served as Program chair, Area chair or senior member of editorial board for leading conferences in the field CVPR, ICCV, ICRA.
Jana is currently mentor of our PhD student: Ivana Beňová