Policy Learning with Continuous Actions Under Unmeasured Confounding
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: causal inference, heterogeneous_treatment_effect, personalized medicine, reinforcement_learning
Session: IPS 736 - Causal Inference for Complex Data
Thursday 9 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)
Abstract
We consider offline policy learning in the presence of unmeasured confounders in a continuous action space. Existing work typically focus on policy evaluation under POMDP, and consider discrete action space. In this work, we first establish a novel identification result to nonparametrically estimate the policy value for a given target policy under an infinite horizon. Based on this identification result, we construct a minimax estimator and develop a policy-gradient-based algorithm to search for the in-class optimal policy that maximizes the estimated policy value. The consistency, finite-sample error bound, and regret bound of the induced optimal policy are investigated. The performance of our proposed method is illustrated through extensive simulations and a real data analysis on family panel studies.