IPS 890 - Recent Advances in Missing Data Methods for Health Research

Category: IPS

Monday 6 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam) (Expired) Room - Mississippi

Participants

Chiu-Hsieh Hsu (Organiser)

In health research, often data are extracted from electronic medical records (EMR) or collected through web-based surveys. It is inevitable to have missing data in EMR data, particularly for biomarker data, which are often only collected on the subjects susceptible to the disease of interest. This indicates those subjects with biomarker data collected are more likely to have abnormal biomarker levels (i.e. missing not at random (MNAR)). Due to its popularity and convenience, an increasing number of health researchers use social media to conduct surveys to collect information for health research. The social media-based surveys usually do not have a well-defined probability sampling structure and have higher nonresponse and coverage errors than the traditional survey methods. This indicates the data are likely to be subject to selection bias. Ignoring potential MNAR and selection bias in data analysis could generate biased analysis results and then lead to questionable scientific conclusions. In addition to potential MNAR and selection bias issues, complex data structures, such as high-dimensional clustered data, may exist in health research based on how the data are collected and the number of variables is captured. Handling missing data is always challenging in high dimensional settings and requires specialized methods to overcome the computational burden.

In this session, we will present recent advances in missing data methods, particularly imputation methods, for health research using EMR, survey data or high-dimensional data. Machine learning methods are popular for analyzing data with complex structures (such as high-dimensional data) and rely on the missing at random (MAR) assumption to handle missing data. However, the missing mechanism is unverifiable and it is possible missing is not at random. The multiple imputation-based sensitivity analysis method derived from Heckman’s selection model, which is presented in this session, can be easily modified and then applied to handle missing data subject to MNAR or selection bias in machine learning.