Cross-Validated Off-Policy Evaluation

Cief, M., Kveton, B.1, Kompan, M.

1 Adobe Research

We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory, which provides only limited guidance to practitioners. We show how to use cross-validation for off-policy evaluation. This challenges a popular belief that cross-validation in off-policy evaluation is not feasible. We evaluate our method empirically and show that it addresses a variety of use cases.

Cite: Cief, M., Kveton, B., & Kompan, M. (2025). Cross-validated off-policy evaluation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 39, No. 15, pp. 16073-16081).

Authors

Matej Čief
PhD Student 09/2021-06/2025
More
Michal Kompan
Lead and Researcher
More