Recent Advances in Missing Data Methods for Health Research
Conference
Category: International Statistical Institute
Proposal Description
In health research, often data are extracted from electronic medical records (EMR) or collected through web-based surveys. It is inevitable to have missing data in EMR data, particularly for biomarker data, which are often only collected on the subjects susceptible to the disease of interest. This indicates those subjects with biomarker data collected are more likely to have abnormal biomarker levels (i.e. missing not at random (MNAR)). Due to its popularity and convenience, an increasing number of health researchers use social media to conduct surveys to collect information for health research. The social media-based surveys usually do not have a well-defined probability sampling structure and have higher nonresponse and coverage errors than the traditional survey methods. This indicates the data are likely to be subject to selection bias. Ignoring potential MNAR and selection bias in data analysis could generate biased analysis results and then lead to questionable scientific conclusions. In addition to potential MNAR and selection bias issues, complex data structures, such as high-dimensional clustered data, may exist in health research based on how the data are collected and the number of variables is captured. Handling missing data is always challenging in high dimensional settings and requires specialized methods to overcome the computational burden.
In this session, we will present recent advances in missing data methods, particularly imputation methods, for health research using EMR, survey data or high-dimensional data. Machine learning methods are popular for analyzing data with complex structures (such as high-dimensional data) and rely on the missing at random (MAR) assumption to handle missing data. However, the missing mechanism is unverifiable and it is possible missing is not at random. The multiple imputation-based sensitivity analysis method derived from Heckman’s selection model, which is presented in this session, can be easily modified and then applied to handle missing data subject to MNAR or selection bias in machine learning.
Submissions
- A Multiple Imputation Comparison Analysis Approach to Health Surveys Subject to Selection Bias
- A multiple imputation-based sensitivity analysis approach for data subject to missing not at random
- A multiple imputation-based sensitivity analysis approach for regression analysis with an MNAR covariate
- Variational Bayesian Multiple Imputation in High-Dimensional Regression Models With Missing Responses