65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Advances in Handling Missing Data for EHR and Causal Inference

Organiser

QC
Qixuan Chen

Participants

  • LH
    Prof. Liangyuan Hu
    (Chair)

  • QC
    Prof. Qixuan Chen
    (Presenter/Speaker)
  • Multiple imputation of patient health Questionnaire-9 for depression screening in EHRs: A patten-mixture ordinal logistic Bayesian additive regression trees model

  • RA
    Dr Rebecca Anthopolos
    (Presenter/Speaker)
  • A Bayesian nonparametric model with an informative visit process: Using electronic health records to evaluate body mass index for diabetes screening in Asian Americans

  • QL
    Prof. Qi Long
    (Presenter/Speaker)
  • Mining for equitable health: Tackling biased, incomplete data in electronic health records

  • RB
    Rohit Bhattacharya
    (Presenter/Speaker)
  • Causal inference with outcome-dependent missingness and self-censoring

  • VT
    Vincent Tan
    (Presenter/Speaker)
  • Accounting for selection bias due to censoring by death

  • Category: International Statistical Institute

    Proposal Description

    Missing data are prevalent, affecting both randomized controlled trials and observational studies. The issue of missing data is a significant challenge in the electronic health records (EHR) analysis. EHRs, not initially collected for research, feature unique challenges in missing data handling, including data recorded at irregular intervals and varying frequencies across different measures. Similarly, the field of causal inference faces hurdles due to the prevalence of missing data, with most existing methods tailored for complete datasets. This gap underscores the urgency of developing causal inference methods that accommodate incomplete data.

    In this invited session, five distinguished speakers will showcase their latest research on addressing missing data, with applications in EHR analysis and causal inference. Professor Qi Long, from University of Pennsylvania, will share his recent research on addressing biased, incomplete data in EHR including more accurate assessment of the harmful impact of incomplete EHR data on algorithmic fairness, challenges associated with mitigating such bias, and potential strategies. Professor Rebecca Anthopolos, from New York University, will present a Bayesian nonparametric joint model of longitudinal BMI and time-to-diabetes diagnosis using longitudinal EHR data to evaluate the effectiveness of various static BMI cutoffs versus patient BMI trajectories for diabetes screening in Asians. To account for an informative visit process whereby a patient’s visit process may be associated with underlying health status, they added a recurrent event submodel for gap times between a patient’s clinic visits. To address missing data from depression screenings recorded in EHRs during routine clinical screenings, Professor Qixuan Chen, from Columbia University, will present an ordinal logistic Bayesian Additive Regression Trees model within a pattern-mixture framework. This model specifically aims to impute multiple missing scores in patient health questionnaires. Professor Rohit Bhattacharya, from Williams College, considers missingness in the context of causal inference when the outcome of interest may be missing. He will present a test to verify identification assumptions that are sufficient to correct for both self-censoring and confounding bias in using shadow variable method. Finally, Dr. Vincent Tan, from Vertex Pharmaceuticals, will show his research on causal inference in accounting for selection bias due to censoring by death using a multiple imputation approach to generate counterfactual predictive distributions of principal strata to estimate survivor average causal effects.