Mining for equitable health: Tackling biased, incomplete data in electronic health records
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: algorithmic-fairness, bias, electronic-health-records, health-inequity, missing-data
Session: IPS 806 - Advances in Handling Missing Data for EHR and Causal Inference
Monday 6 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)
Abstract
Electronic health records (EHR), routinely collected as part of healthcare delivery, have great potential to be utilized for advancing precision medicine. They contain multiple years of health information to be leveraged for risk prediction, disease detection, and treatment evaluation. However, they do not have a consistent, standardized format across institutions, particularly in the United States, and can present significant analytical challenges–they contain multi-scale data from heterogeneous domains and include both structured and unstructured data. Data for individual patients are collected at irregular time intervals and with varying frequencies. In addition, EHR can reflect inequity–for example, patients with less access to healthcare, often people of color or with lower socioeconomic status, tend to have more incomplete data in EHR. Many of these issues can contribute to biased data in EHR. In this talk, I will share our recent research on addressing biased, incomplete data in EHR including more accurate assessment of the harmful impact of incomplete EHR data on algorithmic fairness, challenges associated with mitigating such bias, and potential strategies.