65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Transfer learning approach to sentiment analysis using S-discordance measure

Author

JD
Jasminka Dobsa

Co-author

  • D
    Dalibor Buzic

Conference

65th ISI World Statistics Congress 2025

Format: IPS Abstract - WSC 2025

Keywords: classification, emotions, symbolic_data_analysis, text analysis

Session: IPS 768 - Symbolic Data Analysis for Data Science

Thursday 9 October 10:50 a.m. - 12:30 p.m. (Europe/Amsterdam)

Abstract

The aim of this research is to apply s-discordance measure (1,2) in the context of classification of textual documents according to emotions such as joy, sadness, fear, etc. S-discordance measure measures dissimilarity between a symbolic object and class of such objects. Weight of word or index term will be measured by s-discordance measure S_disc (c, P, x) of a class of documents c with collection of all classes P for a given index term x (3). Classes of documents are created based on the one of the basic emotions present in the document. S-discordance measure measures relevance of the index term x for a class of documents related to emotion. The relevance of term x for a class c is high if the proportion of documents inside the class c that contain that term is high, and the number of classes for which proportion of documents in that class that contain term x is higher than in class c is low. S-discordance measure is used for automatic acquisition of sentiment lexicon from a collection of textual documents. Transfer learning is applied for classification according to emotions by combination of supervised approach using logistic regression and unsupervised approach using lexicon of emotions previously extracted from the another data set.
For experiments are used two data sets for classification of emotions: ISEAR data set (4) in which responders were asked to report situations in which they had experienced all of 7 major emotions and data set Fairy Tales (5) that contains sentences from 185 children's fairy tales (by the Brothers Grimm, H.C. Andersen and B. Potter) hand-labeled according to the seven major emotions.
[1]Diday, E. (2020) Explanatory tools for machine learning in the symbolic data analysis framework. In Diday, E. Guan, R., Wang, H. (eds.) Advances in Data Science, ISTE-Wiley.
[2] Diday, E. (2023). Introduction to the “s-concordance” and “s-discordance” of a Class with a Collection of Classes. In: Beh, E.J., Lombardo, R., Clavel, J.G. (eds) Analysis of Categorical Data from Historical Perspectives. Behaviormetrics: Quantitative Approaches to Human Behavior, 17, Springer, Singapore. DOI: 10.1007/978-981-99-5329-5_27.
[3] Verde, R., Batagelj, V., Brito, P., Silva, A.P.D., Korenjak-Cerne, S., Dobsa, J., Diday, E. (2024). New skills in symbolic dana analysis for official statistics, Statistical Journal of the IAOS, 40(3), 563-579. DOI: 10.3233/SJI-24001310.
[4] International Survey on Emotion Antecedents and Reactions “, ISEAR, data set and description of the questionnaire, data treatment, and variable abbreviations as used in the data base, https://www.unige.ch/cisa/research/materials-and-online-research/research-material/, accessed September 2024
[5] Ovesdotter Alm, C. (2009). Affect in Text and Speech, VDM Verlag Dr Müller