Machine Learning and Statistical Strategies in High-dimensional Predictive Modelling
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Session: IPS 818 - High-Dimensional Statistical Analysis in Precision Medicine
Monday 6 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)
Abstract
Complex big data analysis is a very challenging but rewarding research area as data sets include a larger number of features, data contamination, unstructured patterns, and so on. A host of models are now data driven with a large number of predictors, namely high-dimensional data (HDD), for HDD analysis many penalized methods were introduced for simultaneous variable selection and parameters estimation when the model is sparse. However, a model may have sparse signals as well as with number of predictors with weak signals. In this scenario variable selection methods may not distinguish predictors with weak signals and sparse signals. For this reason, we propose a high-dimensional shrinkage strategy to improve the prediction performance of a sub-model. We demonstrate that the proposed high-dimensional shrinkage strategy performs better than the penalized and machine learning methods in many cases. The relative performance of the proposed strategy is appraised by both simulation studies and the real data analysis. I will also discuss some open research problems and possible solutions.