Position: Recommender and adaptive web-based systems
This work deals with some fundamental problems when using off-policy evaluation and learning in multi-armed bandit systems, namely:
(1) off-policy evaluation with large action spaces,
(2) overconfidence in off-policy optimization for structured recommendations, and
(3) designing safe and optimal data-gathering policies.
Supervised by Branislav Kveton, principal scientist at Amazon, focusing on online and offline bandit algorithms.