Home
News
PhD Themes 2024: Measuring Output Quality of Large Language Models

What's
new

Author

Kempelen Institute of Intelligent Technologies

Jan 11. 2024

PhD Themes 2024: Measuring Output Quality of Large Language Models

Supervising team: Jakub Šimko (supervisor, KInIT), Dominik Macko (KInIT)
Keywords: generative AI, large language models, dataset creation, dataset augmentation, machine generated text detection, metrics and evaluation, machine learning

The advent of large language models (LLMs) is raising research questions about how to measure quality and properties of their outputs. Such measures are needed for benchmarking, model improvements or prompt engineering. Some evaluation techniques pertain to specific domains and scenarios of use (e.g., how accurate are the answers to factual questions in such and such domain? how well can we use the generated answers to train a model for a specific task?), others are more general (e.g., what is the diversity of paraphrases generated by an LLM? how easy to detect it is that the content is generated?).

Through replication studies, benchmarking experiments, metric design, prompt engineering and other approaches, the candidate will advance the methods and experimental methodologies of LLM output quality measurement. Of particular interest are two general scenarios:

Dataset generation and/or augmentation, where LLMs are prompted with (comparatively small) sets of seeds to create much larger datasets. Such an approach can be very useful, when dealing with a domain/task with limited availability of original (labelled) training data (such as disinformation detection).

Detection of generated content, where stylometric-based, deep learning-based, statistics-based, or hybrid methods are used to estimate whether a piece of content was generated or modified by a machine. The detection ability is crucial for many real-world scenarios (e.g., detection of disinformation or frauds), but feeds back also to research methodologies (e.g., detecting the presence of generated content in published datasets or in crowdsourced data).

The candidate will select (but will not be limited to) one of the two general scenarios, identify, and refine specific research questions and experimentally answer them.

Relevant publications:

Cegin, J., Simko, J. and Brusilovsky, P., 2023. ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing https://arxiv.org/pdf/2305.12947.pdf

Macko, D., Moro, R., Uchendu, A., Lucas, J.S., Yamashita, M., Pikuliak, M., Srba, I., Le, T., Lee, D., Simko, J. and Bielikova, M., 2023. MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing https://arxiv.org/pdf/2310.13606.pdf

The research will be performed at the Kempelen Institute of Intelligent Technologies (KInIT, https://kinit.sk) in Bratislava in cooperation with industrial partners or researchers from highly respected research units. A combined (external) form of study and full employment at KInIT is expected.

Apply now

Learn more about PhD at KInIT

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.