Download PDF

Policy Learning with Continuous Actions Under Unmeasured Confounding

Author

Ruoqing Zhu

Co-author

Yuhan Li
Eugene Han
Wenzhuo Zhou
Yifan Cui
Zhengling Qi

Conference

65th ISI World Statistics Congress

Format: IPS Abstract - WSC 2025

Keywords: causal inference, heterogeneous_treatment_effect, personalized medicine, reinforcement_learning

Session: IPS 736 - Causal Inference for Complex Data

Thursday 9 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)

Abstract

We consider offline policy learning in the presence of unmeasured confounders in a continuous action space. Existing work typically focus on policy evaluation under POMDP, and consider discrete action space. In this work, we first establish a novel identification result to nonparametrically estimate the policy value for a given target policy under an infinite horizon. Based on this identification result, we construct a minimax estimator and develop a policy-gradient-based algorithm to search for the in-class optimal policy that maximizes the estimated policy value. The consistency, finite-sample error bound, and regret bound of the induced optimal policy are investigated. The performance of our proposed method is illustrated through extensive simulations and a real data analysis on family panel studies.