Download PDF

Statistics and Generative AI: What about the wizard behind the curtain?

Author

Prof. Zhiwu Zhang

Conference

65th ISI World Statistics Congress

Format: IPS Abstract - WSC 2025

Session: IPS 699 - ChatGPT: Challenges and Opportunities to Statistical Research and Education

Wednesday 8 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)

Abstract

While there is a lot of hype about Artificial Intelligence(AI) and Generative AI, we sometimes forget the statistics and mathematics behind the curtain. The main objective of AI is to make “data informed” decisions, using data and optimization algorithms to ensure accuracy of prediction at least for the data the algorithm is trained on. The problem is sometimes the data that the algorithm is trained on can be incomplete. The immediate question is: Is there a consequence to the decision-making process, even if we have 99% accuracy for the training data? Could statistical methods like stratification help, in this process? Mathematically and computationally, it could be a big issue as the question is with such large data sets, how do we begin to get the proportions for the training data for various attributes. Could some rank based methods based on Non-negative Factorization resolve some of the complexities of this problem. In this talk we will discuss the question of incomplete data and algorithmic bias, in an effort to understand the intricacies of the black box that is AI.