Use of Random Forests in Small Area Estimation
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: fay-herriot, random forest, small area estimation
Thursday 9 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)
Abstract
The demand for statistics for very detailed subgroups of a population is increasing rapidly. However, for a typical probabilistic survey, design-consistent direct estimators of population parameters can be unstable when domain sample sizes are small. To improve the precision of direct estimators, the Fay-Herriot (F-H) area-level model is often used. This model relies on the validity of a linear linking model which specifies the relationship between the parameter of interest and the auxiliary variables. The linearity assumption is not always reasonable, which requires a modification of the linking model, for example using a piecewise linear model. As the demand for small area estimates increases, it becomes more and more relevant to evaluate nonparametric linking models, such as random forests, to determine if they can bring some robustness against departure from the linearity assumption. We have chosen to investigate random forests mainly for two reasons: i) they can be applied to a mixture of categorical and continuous auxiliary variables, and ii) they produce predictions which remain in the range of observed values. During this talk, I will assess the properties of the use of random forests in the F-H model through a simulation study.