Download PDF

Use of Random Forests in Small Area Estimation

Author

Keven Bosa

Conference

65th ISI World Statistics Congress

Format: IPS Abstract - WSC 2025

Keywords: fay-herriot, random forest, small area estimation

Session: IPS 767 - Enhancing Public Confidence in Analytic Quality and Privacy Protection for Public-Use Statistics

Thursday 9 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)

Abstract

The demand for statistics for very detailed subgroups of a population is increasing rapidly. However, for a typical probabilistic survey, design-consistent direct estimators of population parameters can be unstable when domain sample sizes are small. To improve the precision of direct estimators, the Fay-Herriot (F-H) area-level model is often used. This model relies on the validity of a linear linking model which specifies the relationship between the parameter of interest and the auxiliary variables. The linearity assumption is not always reasonable, which requires a modification of the linking model, for example using a piecewise linear model. As the demand for small area estimates increases, it becomes more and more relevant to evaluate nonparametric linking models, such as random forests, to determine if they can bring some robustness against departure from the linearity assumption. We have chosen to investigate random forests mainly for two reasons: i) they can be applied to a mixture of categorical and continuous auxiliary variables, and ii) they produce predictions which remain in the range of observed values. During this talk, I will assess the properties of the use of random forests in the F-H model through a simulation study.