STEDII: A practical framework for good metrics with focus on experimentation and LLMs
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: a/b, analysis, causal inference, large language models
Session: IPS 812 - Experimental and Observational Causal Inference in the Tech Industry
Monday 6 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)
Abstract
Good metrics are critical for making good decisions, but how do you know if your metric set is good for making decisions in your organization ? In this talk, I try to answer this question. I introduce the STEDII (Sensitivity, Trustworthiness, Efficiency, Debuggability, Interpretability, and Inclusivity) framework to define and evaluate the good properties of a metric and of an A/B test analysis in general. Each of these properties are essential; and together, they reinforce each other to ensure a good set of metrics for a proper analysis of an A/B test, which will yield valuable insights and enable good product decisions. I will use examples from LLM & non-LLM based products/feature to illustrate some key points of the framework.