Advances in Bayesian Hierarchical Modeling and Variable Selection for Complex Data
Conference
Category: International Statistical Institute
Abstract
1. The multivariate spike-and-slab LASSO: Algorithms, asymptotics, and inference
We consider multivariate linear regression models to predict q correlated responses (of possibly mixed type) using a common set of p predictors. Our interest lies not only in determining whether a particular predictor has a direct or marginal effect on each response but also in understanding the residual dependence between the outcomes. We propose a Bayesian procedure for such determination using continuous spike-and-slab priors. Rather than relying on a stochastic search through the high-dimensional parameter space, we develop an Expectation Conditional Maximization algorithm targeting modal estimates of the matrix of regression coefficients and residual precision matrix. A key feature of our method is the model of our uncertainty about which parameters are negligible. We further derive posterior contraction rates and discuss several strategies for quantifying posterior uncertainty.
2. Bayesian-frequentist Hybrid Inference in Applications with Small Sample Sizes
The Bayesian-frequentist hybrid model and associated inference can combine the advantages of both Bayesian and frequentist methods and avoid their limitations. However, except for few special cases in existing literature, the computation under the hybrid model is generally non-trivial or even unsolvable. We develop a computation algorithm for hybrid inference under any general loss functions. Simulation and data examples demonstrate that hybrid inference can improve upon frequentist inference by incorporating valuable prior information, and also improve Bayesian inference based on non-informative priors where the latter leads to biased estimates for the small sample sizes used in inference.
3. Variable Selection in Bayesian Multiple Instance Regression using Shotgun Stochastic Search
In multiple instance learning (MIL), each sample has a set of covariate vectors (instances) individually observed, but has only one response variable shared by the
instances. We propose a Bayesian modeling to address two selection problems. One is the instance selection which finds out the instances with capability
of explaining the response. The other is the variable selection which searches for the covariates related with the response. For this, we adopt the stochastic search variable selection (George and McCulloch (1993)) to identify the best subset of explanatory variables, which has not drawn attention in MIL literature before. Our novel model simultaneously solves the two selection tasks by modifying the shotgun stochastic search algorithm (Hans et al. (2007)), which enables Monte Carlo Markov Chain to explore extensive discrete space more efficiently.
4. Bayesian Empirical Likelihood with Dual Penalties for Variable Selection in Ultra-high Dimensional Data
In the semi-parametric domain, under the ultra-high dimensional setting, we propose a Bayesian empirical likelihood method for variable selection, which requires no distributional assumptions but only estimating equations. Motivated by Chang et al. (2018) on doubly penalized empirical likelihood (EL), we introduce priors to regularize both regression parameters and Lagrange multipliers associated with the estimating equations, to promote sparse learning. We show theoretically that the posterior consistency and the variable selection consistency are ensured under some mild conditions. We further develop an efficient Markov chain Monte Carlo (MCMC) sampling algorithm based on the active set idea, which has been proved to be useful in reducing computational burden.