65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Application of machine learning on oral microbiome and resistome profiles: a multicenter cohort study

Author

BS
Bente Sved Skottvoll

Co-author

  • R
    Randi J. Bertelsen
  • M
    Maryia Khomich
  • J
    Jale Moradi

Conference

65th ISI World Statistics Congress 2025

Format: IPS Abstract - WSC 2025

Keywords: compositional, international, machine learning, metagenomics, microbiome, population

Abstract

The oral microbiome is among the most diverse human microbial living grounds, harboring bacteria that may carry antimicrobial resistance genes (ARGs). The oral cavity is an entry point for both opportunistic and pathogenic bacteria. A related question is whether the resistome, the community of ARGs carried by oral microbes, may transfer to non-oral bacteria and contribute to disseminate ARGs within each individual bacteriome. We explored both the resistome and microbiome and relevant factors in oral samples of adult participants collected as part of the European Community Respiratory Health Survey III (ECRHS III, 2011-2014), a multicenter population-based study.

Oral subgingival fluid samples from study centers in Australia (N=107), Estonia (N=109) and Norway (N=119) (median age: 53 years [range: 40-65] and 50% men) were metagenome sequenced to explore taxonomic and antimicrobial resistance (AMR) content through Ariba with the CARD and ResFinder databases and MetaPhlan4, respectively. The data were coupled with selected clinical and questionnaire data on the same participants.

Spearman's rank correlation coefficient analyses showed few concurrent correlations of bacterial families and ARGs, adding to the existing knowledge that ARGs usually spread horizontally within a limited set of taxa. A set of few genes including tet-genes and broad-spectrum beta-lactamases (blaOXA, cfxA3 and cfxA5), correlate with many species. More frequent species correlations suggest these horizontally transferable genes to be more widely distributed and of higher ecological relevance.

Boruta was used to identify ARGs with connection to available metadata. The study center was the most consistent feature explaining the resistome and microbial species composition. For participants who used antibiotics in the last 12 months for upper respiratory symptoms, Boruta found the class A beta-lactamase gene CfxA3 (100%) and ErmX(81%), a gene for ribosomal protection against antibiotic binding. For participants who were hospitalized within the last 12 months prior to sampling, the mosaic tetracycline resistance gene Tet(W/N/W) (89%) gave the strongest signal. Compositional data analysis tools ANCOM-BC2, MaAsLin2 and ALDEx may subsequently contribute to answer whether the presence of an AMR gene or a taxon, or a set of both, could be predicted by the available metadata.

Identifying the study center as an important explanatory variable reflects current knowledge of the regional differences in the prevalence of ARGs and oral species. Both Spearman's rank correlation and Boruta feature selection ignored most genes encoding multiple efflux pumps, which are thought to have a broad function and low antibiotic specificity, with a few exceptions. Boruta was robust in selecting biologically meaningful variables, and can aid in sieving for focused downstream analysis. Few clinically relevant ARGs were found in our participants, indicating that screening of oral microbiota is not advised for surveillance purposes in the general population.