EHR drive prognostic models to assist disease screening
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: early cancer detection, electronic health records, non-small cell lung cancer, risk prediction model
Session: IPS 950 - Novel Statistical Approaches in Biomarker Discovery, Analysis & Disease Screening
Tuesday 7 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)
Abstract
Rationale: Specific patient characteristics increase the risk of cancer, necessitating personalized healthcare approaches. For high-risk individuals, tailored clinical management ensures proactive monitoring and timely interventions. Electronic Health Records (EHR) data are crucial for supporting these personalized approaches, improving cancer prevention and early diagnosis.
Objectives: We leverage EHR data and build a prediction model for early detection of non-small cell lung cancer (NSCLC).
Methods: We utilize data from Mass General Brigham’s EHR and implement a three-stage ensemble learning approach. Initially, we generate risk scores using multivariate logistic regression in a self-control and case-control design to distinguish between cases and controls. Subsequently, these risk scores are integrated and calibrated using a prospective Cox model to develop the risk prediction model.
Results: We identified 127 EHR-derived features predictive for early detection of NSCLC. The highly predictive features include smoking, relevant lab test results, and chronic lung diseases. The predictive model reached area under the ROC curve (AUC) of 0.801 (positive predictive value (PPV) 0.0173 with specificity 0.02) for predicting one-year NSCLC risk in a population aged 18 and above, and AUC of 0.757 (PPV 0.0196 with specificity 0.02) in a population aged 40 and above.
Conclusions: This study identified EHR derived features which are predictive of early NSCLC diagnosis. The developed risk prediction model exhibits superior performance for early detection of NSCLC compared to a baseline model that only relies on demographic and smoking information, demonstrating the potential of incorporating EHR derived features for personalized cancer screening recommendations and early detection.