65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

On the robustness of random forests for genomic prediction and selection in breeding studies

Conference

65th ISI World Statistics Congress 2025

Format: CPS Abstract - WSC 2025

Keywords: machine learning, random forests, robust modelling

Session: CPS 19 - Genomics and Conservation

Monday 6 October 5:10 p.m. - 6:10 p.m. (Europe/Amsterdam)

Abstract

The analysis of real data is often vulnerable to the violation
of underlying model assumptions, which can be especially exacerbated by
data misspecifications such as errors or outliers. In the context of linear
regression, the presence of even a single outlier can disrupt the normality assumption, leading to compromised parameter estimation and other
subsequent, also compromised inferential results. Machine learning methods, including Random Forests, are not immune to data contamination,
and existing literature has recognized the need for robust statistical techniques to address this issue, particularly in high-dimensional data analysis, which includes variable selection and prediction.
While data contamination can occur at both the response (output) and
covariate (feature) levels, this work primarily focuses on the former. To
address this, we will evaluate the performance of the classical Random
Forest method through simulations, incorporating robust techniques to
enhance its resilience to data contamination. Specifically, we will employ a synthetic animal dataset from the literature, introducing various
plausible contamination scenarios. This study aims to shed light on the
implications of data contamination in genomic prediction and selection
for breeding studies, offering insights into possible robust adaptations of
Random Forests that will help mitigate the challenges posed by certain
types of contamination. Ideally, one wishes to propose a robust counterpart of the Random Forests algorithm that can be used routinely
alongside the latter in genomic prediction and selection studies.