IPS 890 - Recent Advances in Missing Data Methods for Health Research
Category: IPSParticipants
In health research, often data are extracted from electronic medical records (EMR) or collected through web-based surveys. It is inevitable to have missing data in EMR data, particularly for biomarker data, which are often only collected on the subjects susceptible to the disease of interest. This indicates those subjects with biomarker data collected are more likely to have abnormal biomarker levels (i.e. missing not at random (MNAR)). Due to its popularity and convenience, an increasing number of health researchers use social media to conduct surveys to collect information for health research. The social media-based surveys usually do not have a well-defined probability sampling structure and have higher nonresponse and coverage errors than the traditional survey methods. This indicates the data are likely to be subject to selection bias. Ignoring potential MNAR and selection bias in data analysis could generate biased analysis results and then lead to questionable scientific conclusions. In addition to potential MNAR and selection bias issues, complex data structures, such as high-dimensional clustered data, may exist in health research based on how the data are collected and the number of variables is captured. Handling missing data is always challenging in high dimensional settings and requires specialized methods to overcome the computational burden.
In this session, we will present recent advances in missing data methods, particularly imputation methods, for health research using EMR, survey data or high-dimensional data. Machine learning methods are popular for analyzing data with complex structures (such as high-dimensional data) and rely on the missing at random (MAR) assumption to handle missing data. However, the missing mechanism is unverifiable and it is possible missing is not at random. The multiple imputation-based sensitivity analysis method derived from Heckman’s selection model, which is presented in this session, can be easily modified and then applied to handle missing data subject to MNAR or selection bias in machine learning.
Abstracts and papers
A multiple imputation-based sensitivity analysis approach for data missing not at random
For more details on registrations and submissions for the 65th ISI World Statistics Congress 2025, please first login to your account. If you do not have an account then you can create one below:
X Cookies Policy
We have placed cookies on your device to help make this website better.
You can change your cookie settings in your web browser. Otherwise, we’ll assume you’re OK to continue.
Some of the cookies we use are essential for the site to work.
We also use some non-essential cookies to collect information for making reports and to help us improve the site. The cookies collect information in an anonymous form.
To control third party cookies, you can also adjust your browser settings.
Do Not Accept Third Party Cookies