Home
News
Knowledge sharing seminar in the field of Reinforcement Learning

What's
new

Event

Event date: June 24, 2025

Event place: Bratislava, Slovakia

Jun 17. 2025

Knowledge sharing seminar in the field of Reinforcement Learning

Branislav Kveton, Principal Research Scientist at Adobe Research, gave a lecture on Reinforcement Learning with Large Language Models Through Reward-Weighted Fine-Tuning.

Lecture abstract

Reinforcement learning (RL) with large language models (LLMs) has enabled recent progress in training reasoning models. In this work, we show how to reduce offline RL with LLMs to reward-weighted supervised fine-tuning (SFT). This allows practical RL optimisation of LLM agents using just SFT, arguably the most common approach for training LLMs. Unlike offline variants of other approaches, such as PPO and GRPO, we do not need token-level rewards or reward models, and avoid propensity score ratios in the objective. We demonstrate our approach on several LLM agent optimisation problems: increasing sales, improving recommendation accuracy, and learning to reason in question-answering agents. This is joint work with Subhojyoti Mukherjee, Viet Dac Lai, Raghavendra Addanki, Ryan Rossi, Seunghyun Yoon, Trung Bui, Anup Rao, and Jayakumar Subramanian.

What's
new

Knowledge sharing seminar in the field of Reinforcement Learning

Lecture abstract

Photos from the lecture

Why partner with KInIT

What'snew

Knowledge sharing seminar in the field of Reinforcement Learning

Lecture abstract

Photos from the lecture

Why partner with KInIT

What's
new