Humanising Data Beyond Numbers: Leveraging Advanced Analytics with Qualitative Insights in the Digital Era
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Wednesday 8 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)
Abstract
Population-based surveys and registry studies provide valuable insights but often fail to reveal the experiences and narratives behind the data. While qualitative methods offer depth, their findings can be challenging to generalize across populations.
This paper first discusses how data clustering techniques, such as the Self-Organizing Map (SOM) (1), can identify meaningful groups within complex datasets. These clusters can inspire the development of new concepts derived directly from the data. SOM distinguishes itself among other deep learning methods by providing a visual representation of the clustering structures obtained. This enables researchers to observe these groupings, their properties, and their similarity ordering at different scales, offering a clear contrast to black-box models. Given an ordered grouping along with the general characteristics of each cluster, it becomes feasible to systematically examine selected individual data points, even within very large datasets. This approach enhances both the systematicity and scalability of the research.
Second, we explain how in-depth textual content analysis can be applied to survey responses using the FinnSurveyText tool (2). This tool identifies key concepts and their strong statistical associations across response sets, facilitating comparisons and contrasts of themes discussed by different subgroups.
We propose combining these methods to create a systematic and transparent exploratory research approach that integrates quantitative and qualitative techniques. This methodology fosters concept innovation, hypothesis formation, and theory generation, closely aligning with the principles of Constructivist Grounded Theory.
SOM has a long history in the exploratory data analysis of complex phenomena across various fields, from industrial processes to language formation (1). It has also proven valuable in survey analysis for tasks such as data imputation and error detection (3). The proposed hybrid approach is likely to be particularly useful in studying societal changes, where new behaviors emerge yet remain unexamined in social contexts. Furthermore, it is worth considering the types of undiscovered groupings and textual characteristics that could be revealed in datasets such as social and health registries.
Bibliography
[1] Kohonen, T. (2001). Self-Organizing Maps. Springer Series in Information Sciences, Vol 30. Springer, Berlin, Heidelberg, New York, 1995, 1997, 2001. Third Extended Edition. DOI: https://doi.org/10.1007/978-3-642-56927-2
[2] Fessant, F., & Midenet, S. (2002). Self-organising map for data imputation and correction in surveys. Neural Computing & Applications, 10, 300-310.
[3] Clarke A, Lagus K, Laine K, Litova M, Nelimarkka M, Oksanen J, Peltonen J, Oikarinen T, Tirkkonen J, Toivanen I, Valaste M (2024). finnsurveytext: Analyse Open-Ended Survey Responses in Finnish. R package version 2.0.0, https://CRAN.R-project.org/package=finnsurveytext