An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: "adaptive, causal inference, experimental-design, sequential_test
Session: IPS 812 - Experimental and Observational Causal Inference in the Tech Industry
Monday 6 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)
Abstract
Experimentation is crucial for managers to rigorously quantify the value of a change and determine if it leads to a statistically significant improvement over the status quo, thus augmenting their decision-making. Many companies now mandate that all changes undergo experimentation, presenting two challenges: (1) reducing the risk/cost of experimentation by minimizing the proportion of customers assigned to the inferior treatment and (2) increasing experimentation velocity by enabling managers to stop experiments based on diagnostic metrics as the experiment is running. This paper simultaneously addresses both challenges by proposing the Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit (MAB) algorithms that enables anytime valid inference on the Average Treatment Effect (ATE) for \emph{any} MAB algorithm. Intuitively, the MAB ``mixes'' any bandit algorithm with a Bernoulli design such that at each time step, the probability that a customer is assigned treatment via the Bernoulli design is controlled by a user-specified deterministic sequence that can converge to zero. The sequence enables managers to directly and interpretably control the trade-off between regret minimization and inferential precision. Under mild conditions on the rate the sequence converges to zero, we provide a confidence sequence that is asymptotically anytime-valid and demonstrate that the MAD is guaranteed to have a finite stopping time as long as the true ATE converges to a non-zero value. Hence, the MAD allows managers to stop experiments early when a significant ATE is detected while ensuring valid inference, enhancing both the efficiency and reliability of adaptive experiments. Empirically, we demonstrate that the MAD achieves finite-sample anytime-validity while accurately and precisely estimating the ATE, all without incurring significant losses in reward compared to standard bandit designs.