Spectral CLTs with long memory for large language models
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: "asymptotic, central limit theorem,, large language models, long-memory
Session: CPS 4 - Stochastic Processes and Functional Data
Monday 6 October 5:10 p.m. - 6:10 p.m. (Europe/Amsterdam)
Abstract
Since the pioneering works from the 1980s by Breuer, Dobrushin, Major, Rosenblatt, Taqqu and others, central and noncentral limit theorems for $Y_t$ have been constantly refined, extended and applied to an increasing number of diverse situations. In recent years, fourth moment theorem CLTs, quantitative CLTs, Breuer-Major and Dobrushin-Major CLTs, de Jong CLTs, functional CLTs and others have been developed. Recently, Maini and Nourdin (2024) extended this to spectral central limit theorems valid for additive functionals of isotropic and stationary Gaussian fields. Their work uses Malliavin-Stein method and Fourier analysis techniques to situations where $Y_t$ admits Gaussian fluctuations in a long memory context. In another recent article, Wang et al. (2023) augmented existing language models with long-term memory. Namely, existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. They proposed a framework of Language Models Augmented with Long-Term Memory, which enables LLMs to memorize long history. In our article we develop spectral central limit theorems in a context of augmented large language models of Wang and coauthors. Our analysis is put in a mean-field analysis context to derive appropriate limiting theorems in the usual two part scheme: a nonlinear partial differential and linear stochastic partial differential equation, to take into account the mean field limit and CLT. Analysis is set in a stochastic Ising model interacting particle systems perspective to account for the Transformer structure of the LLM. We present applications on datasets from finance and medical imaging. In conclusion we discuss possible Bayesian extensions, as well as implications for statistical estimation and inference in a natural language processing context.