Matej Čief

Research areas: machine learning, reinforcement learning, deep learning, natural language processing, recommender systems, user modelling

Position: PhD Student

Matej is a PhD student focused on recommender systems (off-policy learning and evaluation). In his research he designs estimators and data-gathering policies that can evaluate recommendation algorithms without the need of deploying the whole system online. Supervised by Michal Kompan (KInIT) and Branislav Kveton (Amazon’s lab in Berkeley).

He holds a Master’s degree in Intelligent Software Systems from the Slovak University of Technology. During his studies, he has worked on multiple research projects, including User Modeling and his master’s thesis: Use of Semi-supervised Learning for Fake News Detection. Matej has gained experience in applied machine learning research through his work as a data scientist, where he mainly focused on solving problems in the fields of natural language processing and time series forecasting.

PhD topic: Recommender and adaptive web-based systems

Supervising team: Michal Kompan (KInIT), Branislav Kveton (Amazon’s lab in Berkeley)

Off-policy learning and evaluation for contextual multi-armed bandits are a highly desirable solution for improving the performance of recommender systems as the process can be run offline, that is without directly interacting with the users, preventing the deployment of sub-optimal policies. Despite the many benefits of off-policy learning and evaluation, the accuracy of the policy evaluation and computational complexity of the optimization over the policy space prevent the wider use of this approach in practice. Off-policy learning and evaluation struggle the most when the action space is large and the number of available interactions is limited. In this work, we focus to address three problems:

(1) Off-policy evaluation with large action spaces by proposing a novel approach that learns action embeddings, resulting in a provably better bias-variance trade-off.

(2) Applying pessimistic value estimation to address the overconfidence in off-policy learning for combinatorial action spaces, while keeping the optimization computationally tractable.

(3) Bridging the gap between offline and online learning by studying the safe optimal design of data-gathering policies for learning to rank problems.