Statistics and Generative AI: What about the wizard behind the curtain?
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Session: IPS 699 - ChatGPT: Challenges and Opportunities to Statistical Research and Education
Wednesday 8 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)
Abstract
While there is a lot of hype about Artificial Intelligence(AI) and Generative AI, we sometimes forget the statistics and mathematics behind the curtain. The main objective of AI is to make “data informed” decisions, using data and optimization algorithms to ensure accuracy of prediction at least for the data the algorithm is trained on. The problem is sometimes the data that the algorithm is trained on can be incomplete. The immediate question is: Is there a consequence to the decision-making process, even if we have 99% accuracy for the training data? Could statistical methods like stratification help, in this process? Mathematically and computationally, it could be a big issue as the question is with such large data sets, how do we begin to get the proportions for the training data for various attributes. Could some rank based methods based on Non-negative Factorization resolve some of the complexities of this problem. In this talk we will discuss the question of incomplete data and algorithmic bias, in an effort to understand the intricacies of the black box that is AI.