Advances in Handling Missing Data for EHR and Causal Inference
Conference
Category: International Statistical Institute
Proposal Description
Missing data are prevalent, affecting both randomized controlled trials and observational studies. The issue of missing data is a significant challenge in the electronic health records (EHR) analysis. EHRs, not initially collected for research, feature unique challenges in missing data handling, including data recorded at irregular intervals and varying frequencies across different measures. Similarly, the field of causal inference faces hurdles due to the prevalence of missing data, with most existing methods tailored for complete datasets. This gap underscores the urgency of developing causal inference methods that accommodate incomplete data.
In this invited session, five distinguished speakers will showcase their latest research on addressing missing data, with applications in EHR analysis and causal inference. Professor Qi Long, from University of Pennsylvania, will share his recent research on addressing biased, incomplete data in EHR including more accurate assessment of the harmful impact of incomplete EHR data on algorithmic fairness, challenges associated with mitigating such bias, and potential strategies. Professor Rebecca Anthopolos, from New York University, will present a Bayesian nonparametric joint model of longitudinal BMI and time-to-diabetes diagnosis using longitudinal EHR data to evaluate the effectiveness of various static BMI cutoffs versus patient BMI trajectories for diabetes screening in Asians. To account for an informative visit process whereby a patient’s visit process may be associated with underlying health status, they added a recurrent event submodel for gap times between a patient’s clinic visits. To address missing data from depression screenings recorded in EHRs during routine clinical screenings, Professor Qixuan Chen, from Columbia University, will present an ordinal logistic Bayesian Additive Regression Trees model within a pattern-mixture framework. This model specifically aims to impute multiple missing scores in patient health questionnaires. Professor Rohit Bhattacharya, from Williams College, considers missingness in the context of causal inference when the outcome of interest may be missing. He will present a test to verify identification assumptions that are sufficient to correct for both self-censoring and confounding bias in using shadow variable method. Finally, Dr. Vincent Tan, from Vertex Pharmaceuticals, will show his research on causal inference in accounting for selection bias due to censoring by death using a multiple imputation approach to generate counterfactual predictive distributions of principal strata to estimate survivor average causal effects.
Submissions
- A Bayesian Nonparametric Model with an Informative Visit Process: Using Electronic Health Records to Evaluate Body Mass Index for Diabetes Screening
- Accounting for selection bias due to death in estimating the effect of wealth shock on cognition for the Health and Retirement Study
- Causal Inference With Outcome-Dependent Missingness And Self-Censoring
- Mining for equitable health: Tackling biased, incomplete data in electronic health records