{"id":37837,"date":"2025-06-17T09:15:00","date_gmt":"2025-06-17T07:15:00","guid":{"rendered":"https:\/\/kinit.sk\/event\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/"},"modified":"2025-09-11T15:37:40","modified_gmt":"2025-09-11T13:37:40","slug":"knowledge-sharing-seminar-in-the-field-of-reinforcement-learning","status":"publish","type":"event","link":"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/","title":{"rendered":"Knowledge sharing seminar in the field of Reinforcement Learning"},"content":{"rendered":"<div id=\"\" class=\"element core-paragraph\">\n<p><strong>Branislav Kveton, Principal Research Scientist at Adobe Research, gave a lecture on <em>Reinforcement Learning with Large Language Models Through Reward-Weighted Fine-Tuning<\/em>.<\/strong><\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-heading\">\n<h4 class=\"wp-block-heading\">Lecture abstract<\/h4>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>Reinforcement learning (RL) with large language models (LLMs) has enabled recent progress in training reasoning models. In this work, we show how to reduce offline RL with LLMs to reward-weighted supervised fine-tuning (SFT). This allows practical RL optimisation of LLM agents using just SFT, arguably the most common approach for training LLMs. Unlike offline variants of other approaches, such as PPO and GRPO, we do not need token-level rewards or reward models, and avoid propensity score ratios in the objective. We demonstrate our approach on several LLM agent optimisation problems: increasing sales, improving recommendation accuracy, and learning to reason in question-answering agents. This is joint work with Subhojyoti Mukherjee, Viet Dac Lai, Raghavendra Addanki, Ryan Rossi, Seunghyun Yoon, Trung Bui, Anup Rao, and Jayakumar Subramanian.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-heading\">\n<h2 class=\"wp-block-heading\">Photos from the lecture<\/h2>\n<\/div>\n\n<div id=\"\" class=\"element core-columns\">\n<div class=\"wp-block-columns has-small-font-size is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\"><div id=\"\" class=\"element core-column\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><div id=\"\" class=\"element core-image\">\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"768\" height=\"1024\" data-src=\"https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton3-1-768x1024.jpg\" alt=\"\" class=\"wp-image-37787 lazyload\" data-srcset=\"https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton3-1-768x1024.jpg 768w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton3-1-225x300.jpg 225w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton3-1-1152x1536.jpg 1152w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton3-1-1536x2048.jpg 1536w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton3-1-255x341.jpg 255w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton3-1-scaled.jpg 1920w\" data-sizes=\"(max-width: 768px) 100vw, 768px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 768px; --smush-placeholder-aspect-ratio: 768\/1024;\" \/><\/figure>\n<\/div><\/div>\n<\/div>\n\n<div id=\"\" class=\"element core-column\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><div id=\"\" class=\"element core-image\">\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"748\" height=\"997\" data-src=\"https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton1.jpg\" alt=\"\" class=\"wp-image-37781 lazyload\" data-srcset=\"https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton1.jpg 748w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton1-225x300.jpg 225w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton1-255x341.jpg 255w\" data-sizes=\"(max-width: 748px) 100vw, 748px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 748px; --smush-placeholder-aspect-ratio: 748\/997;\" \/><\/figure>\n<\/div><\/div>\n<\/div>\n\n<div id=\"\" class=\"element core-column\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><div id=\"\" class=\"element core-image\">\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"768\" height=\"1024\" data-src=\"https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton2-768x1024.jpg\" alt=\"\" class=\"wp-image-37785 lazyload\" data-srcset=\"https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton2-768x1024.jpg 768w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton2-225x300.jpg 225w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton2-1152x1536.jpg 1152w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton2-1536x2048.jpg 1536w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton2-255x341.jpg 255w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/09\/kveton2-scaled.jpg 1920w\" data-sizes=\"(max-width: 768px) 100vw, 768px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 768px; --smush-placeholder-aspect-ratio: 768\/1024;\" \/><\/figure>\n<\/div><\/div>\n<\/div><\/div>\n<\/div>","protected":false},"featured_media":37791,"template":"","meta":{"_acf_changed":true,"footnotes":""},"categories":[87],"class_list":["post-37837","event","type-event","status-publish","has-post-thumbnail","hentry","category-past-events-sk"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Knowledge sharing seminar in the field of Reinforcement Learning - KInIT<\/title>\n<meta name=\"description\" content=\"Branislav Kveton, Principal Research Scientist at Adobe Research, gave a lecture on Reinforcement Learning with Large Language Models...\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/\" \/>\n<meta property=\"og:locale\" content=\"sk_SK\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Knowledge sharing seminar in the field of Reinforcement Learning - KInIT\" \/>\n<meta property=\"og:description\" content=\"Branislav Kveton, Principal Research Scientist at Adobe Research, gave a lecture on Reinforcement Learning with Large Language Models...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"KInIT\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-11T13:37:40+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/kinit.sk\/wp-content\/uploads\/2024\/10\/slovaks.ai-event_Knowledge-sharing-seminar-in-the-field-of-Reinforcement-Learning-scaled.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1340\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@kinit\" \/>\n<meta name=\"twitter:label1\" content=\"Predpokladan\u00fd \u010das \u010d\u00edtania\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 min\u00faty\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/podujatie\\\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\\\/\",\"url\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/podujatie\\\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\\\/\",\"name\":\"Knowledge sharing seminar in the field of Reinforcement Learning - KInIT\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/podujatie\\\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/podujatie\\\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/kinit.sk\\\/wp-content\\\/uploads\\\/2024\\\/10\\\/slovaks.ai-event_Knowledge-sharing-seminar-in-the-field-of-Reinforcement-Learning-scaled.png\",\"datePublished\":\"2025-06-17T07:15:00+00:00\",\"dateModified\":\"2025-09-11T13:37:40+00:00\",\"description\":\"Branislav Kveton, Principal Research Scientist at Adobe Research, gave a lecture on Reinforcement Learning with Large Language Models...\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/podujatie\\\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\\\/#breadcrumb\"},\"inLanguage\":\"sk-SK\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/kinit.sk\\\/sk\\\/podujatie\\\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"sk-SK\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/podujatie\\\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\\\/#primaryimage\",\"url\":\"https:\\\/\\\/kinit.sk\\\/wp-content\\\/uploads\\\/2024\\\/10\\\/slovaks.ai-event_Knowledge-sharing-seminar-in-the-field-of-Reinforcement-Learning-scaled.png\",\"contentUrl\":\"https:\\\/\\\/kinit.sk\\\/wp-content\\\/uploads\\\/2024\\\/10\\\/slovaks.ai-event_Knowledge-sharing-seminar-in-the-field-of-Reinforcement-Learning-scaled.png\",\"width\":2560,\"height\":1340},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/podujatie\\\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/uvod\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Past events\",\"item\":\"https:\\\/\\\/kinit.sk\\\/category\\\/past-events\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Knowledge sharing seminar in the field of Reinforcement Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\",\"url\":\"https:\\\/\\\/kinit.sk\\\/\",\"name\":\"KInIT\",\"description\":\"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/kinit.sk\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"sk-SK\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Knowledge sharing seminar in the field of Reinforcement Learning - KInIT","description":"Branislav Kveton, Principal Research Scientist at Adobe Research, gave a lecture on Reinforcement Learning with Large Language Models...","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/","og_locale":"sk_SK","og_type":"article","og_title":"Knowledge sharing seminar in the field of Reinforcement Learning - KInIT","og_description":"Branislav Kveton, Principal Research Scientist at Adobe Research, gave a lecture on Reinforcement Learning with Large Language Models...","og_url":"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/","og_site_name":"KInIT","article_modified_time":"2025-09-11T13:37:40+00:00","og_image":[{"width":2560,"height":1340,"url":"https:\/\/kinit.sk\/wp-content\/uploads\/2024\/10\/slovaks.ai-event_Knowledge-sharing-seminar-in-the-field-of-Reinforcement-Learning-scaled.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@kinit","twitter_misc":{"Predpokladan\u00fd \u010das \u010d\u00edtania":"2 min\u00faty"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/","url":"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/","name":"Knowledge sharing seminar in the field of Reinforcement Learning - KInIT","isPartOf":{"@id":"https:\/\/kinit.sk\/#website"},"primaryImageOfPage":{"@id":"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/#primaryimage"},"image":{"@id":"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/kinit.sk\/wp-content\/uploads\/2024\/10\/slovaks.ai-event_Knowledge-sharing-seminar-in-the-field-of-Reinforcement-Learning-scaled.png","datePublished":"2025-06-17T07:15:00+00:00","dateModified":"2025-09-11T13:37:40+00:00","description":"Branislav Kveton, Principal Research Scientist at Adobe Research, gave a lecture on Reinforcement Learning with Large Language Models...","breadcrumb":{"@id":"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/#breadcrumb"},"inLanguage":"sk-SK","potentialAction":[{"@type":"ReadAction","target":["https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/"]}]},{"@type":"ImageObject","inLanguage":"sk-SK","@id":"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/#primaryimage","url":"https:\/\/kinit.sk\/wp-content\/uploads\/2024\/10\/slovaks.ai-event_Knowledge-sharing-seminar-in-the-field-of-Reinforcement-Learning-scaled.png","contentUrl":"https:\/\/kinit.sk\/wp-content\/uploads\/2024\/10\/slovaks.ai-event_Knowledge-sharing-seminar-in-the-field-of-Reinforcement-Learning-scaled.png","width":2560,"height":1340},{"@type":"BreadcrumbList","@id":"https:\/\/kinit.sk\/sk\/podujatie\/knowledge-sharing-seminar-in-the-field-of-reinforcement-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/kinit.sk\/sk\/uvod\/"},{"@type":"ListItem","position":2,"name":"Past events","item":"https:\/\/kinit.sk\/category\/past-events\/"},{"@type":"ListItem","position":3,"name":"Knowledge sharing seminar in the field of Reinforcement Learning"}]},{"@type":"WebSite","@id":"https:\/\/kinit.sk\/#website","url":"https:\/\/kinit.sk\/","name":"KInIT","description":"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/kinit.sk\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"sk-SK"}]}},"_links":{"self":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/event\/37837","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/event"}],"about":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/types\/event"}],"version-history":[{"count":1,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/event\/37837\/revisions"}],"predecessor-version":[{"id":38011,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/event\/37837\/revisions\/38011"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/media\/37791"}],"wp:attachment":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/media?parent=37837"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/categories?post=37837"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}