Download PDF

On the robustness of machine learning methods for genomic prediction

Author

Vanda Marisa da Rosa Milheiro Lourenco

Co-author

Joseph O. Ogutu
Hans-Peter Piepho

Conference

64th ISI World Statistics Congress

Format: IPS Abstract

Keywords: accuracy, disease-modelling, genomics, machine learning, robustness, supervised learning

Session: IPS 169 - Statistical Research by Women from Around the Globe

Monday 17 July 2 p.m. - 3:40 p.m. (Canada/Eastern)

Abstract

The accurate prediction of genomic breeding values is central to genomic selection in both plant and animal breeding studies. Genomic prediction (GP) involves the use of thousands of molecular markers spanning the entire genome and therefore requires methods able to efficiently handle high dimensional data. Machine learning (ML) methods, which encompass different groups of supervised and unsupervised learning methods, are becoming widely advocated for and used in GP studies. Although several studies have compared the predictive performances of individual methods, studies comparing the predictive
performance of different groups of methods are rare. This is also the case of studies that assess the predictive performance of methods when data are contaminated. However, such studies are crucial for (i) identifying groups of methods with superior predictive performance, and (ii) assessing the merits and demerits of such groups of methods relative to each other and to the established classical methods when the phenotypic data are and are not contaminated.

Here, we comparatively evaluate in terms of predictive accuracy and prediction errors the genomic predictive performance and robustness of several groups of supervised ML methods. Specifically, regularized, ensemble, and instance-based methods, using one simulated dataset (animal breeding population; three distinct traits).