PhD Themes 2024: Improving Natural Language Processing

Supervising team:  Marián Šimko (supervisor, KInIT), Jana Kosecka (George Mason University) or Martin Hurban (ČSOB)
Keyword:  large language models, natural language processing, trustworthy NLP, multilingual learning, information extraction

The recent development of large language models (LLMs) shows the potential of deep learning and artificial neural networks for many natural language processing (NLP) tasks. Advances in their automation have a significant impact on a plethora of innovative applications affecting everyday life. 

Although large-scale language models have been successfully used to solve a large number of tasks, several research challenges remain. These may be related with individual natural language processing tasks, application domains, or the languages themselves. In addition, new challenges stemming from the nature of large language models and the so-called black-box nature of neural network-based models. 

Further research and exploration of related phenomena is needed, with special attention to the problem of trustworthiness in NLP or new learning paradigms addressing the problem of low availability of resources needed for learning (low-resource NLP). 

Interesting research challenges that can be addressed within the topic include: 

  • Large language models and their properties (e.g., hallucination understanding)
  • Trustworthy NLP (e.g., bias mitigation, explainability of models)
  • Adapting large language models to a specific context and task (e.g. via PEFT, RAG)
  • Advanced learning techniques (e.g., transfer learning, multilingual learning)
  • Domain-specific information extraction and text classification (e.g., novel methods for sentiment analysis, improving conversation quality in chatbots)

Relevant publications:

The research will be performed at the Kempelen Institute of Intelligent Technologies (KInIT, in Bratislava in cooperation with — depending on selected subtopic — industrial partners or researchers from highly respected research units from abroad. A combined (external) form of study and full employment at KInIT is expected.

Supervising team

Marián Šimko Lead researcher, KInIT More info
Close Marián Šimko Lead researcher, KInIT

Marián Šimko is an expert researcher at KInIT. Marián focuses on natural language processing, information extraction, low-resource language processing and trustworthiness of neural models. He is a former vice-dean for Master’s study and alumni co-operation at the Slovak University of Technology.

Jana Kosecka Professor, George Mason University, USA More info
Close Jana Kosecka Professor, George Mason University, USA

Jana Kosecka is a Professor at the George Mason University. She is interested in computational models of vision systems, acquisition of static and dynamic models of environments by means of visual sensing, high-level semantic scene understanding and human-computer interaction. She held visiting positions at UC Berkeley, Stanford University, Google and Nokia Research, and served as Program chair, Area chair or senior member of editorial board for  leading conferences in the field CVPR, ICCV, ICRA.

Jana is currently mentor of our PhD student: Ivana Beňová

Martin Hurban Data Science Team Lead, ČSOB More info
Close Martin Hurban Data Science Team Lead, ČSOB

Martin Hurban is Data Science Team Lead at ČSOB. He is coordinating AI project implementation within the bank and is leading a team responsible for ČSOB’s digital companion Kate, capability to understand natural language. During his Ph. D. in the area of Solidification of multicomponent alloys, teaching was done in unconstrained optimization, nonlinear programing and numerical methods.