65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Categorical Data Encoding: from H.O. Hirschfeld to Machine Learning

Conference

65th ISI World Statistics Congress 2025

Format: CPS Abstract - WSC 2025

Keywords: categorical, machine learning

Session: CPS 14 - Ordinal Data and Tree-Based Methods

Wednesday 8 October 4 p.m. - 5 p.m. (Europe/Amsterdam)

Abstract

The need for Machine Learning algorithms to convert qualitative data into numerical data has led to many (and sometimes incongruous) proposals or to the rediscovery of work by statisticians dating back almost a century. The optimal coding methods developed by Hirschfeld (aka H.O. Hartley), R.A.Fisher, L.Guttmann, C.Hayashi and many others are at the origin of correspondence analysis. The links with the search for transformations to normal distributions were established by H.O.Lancaster, M.Kendall and A. Stuart. This paper starting with pioneering work in the 1930s, will provide an overview of the various approaches up to the current revival inspired by the need to process high-dimensional categorical data.