65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Integrating machine learning and statistical methods for corruption risk assessment: a case study in public procurement

Conference

65th ISI World Statistics Congress 2025

Format: IPS Abstract - WSC 2025

Keywords: anomaly detection;, corruption risk;, red flag indicators;

Session: IPS 695 - Statistics Concourse of Machine Learning and Artificial Intelligence

Thursday 9 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)

Abstract

Corruption affects both developed and developing nations. Effective control requires governments to prevent fraud, malfeasance, and misconduct to manage resources efficiently. Recent strategies focus on prevention by eliminating opportunities for corruption through education, public awareness, and national anti-corruption programs. Risk assessment is key, with red flag indicators playing a central role in detecting corruption risks and formulating proactive mitigation strategies. Red flag indicators analyze system anomalies to raise alerts for potential risks. In the era of big data, timely information processing is crucial. Machine learning techniques allow continuous improvement and adaptation: as new data is gathered, models refine their predictions, becoming more accurate over time. This dynamic ability is essential in facing evolving forms of corruption. Statistics significantly enhance machine learning-based corruption risk assessment, from data retrieval to anomaly detection and prediction. By integrating both fields, a risk assessment tool can make accurate predictions while providing transparency, shedding light on how variables contribute to outcomes. For example, statistics help assess data quality: errors and inconsistencies may arise due to carelessness, inexperience, or faulty data architecture, especially when data is scraped from various sources. The risk assessment system should distinguish between real anomalies and false positives, like minor data entry mistakes. Quantile regression can also be applied to detect anomalies, focusing on distribution extremes rather than the center. This approach can be applied to public procurement, a sector particularly vulnerable to corruption. Specifically, in Italy, where detailed tender data is available, this method can highlight corruption risks more effectively