Automatic Classification of Indonesian Scientific Paper to the Sustainable Development Goals using Transfer Learning
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: classification, data-science, nlp, sdgs, text analysis, text-mining, textmining, transfer-learning
Session: CPS 75 - Machine Learning, AI and the Sustainable Development Goals
Wednesday 8 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
Research has a significant role in advancing the United Nations Sustainable Development Goals (SDGs). The results have been recorded in scientific papers. Mapping the contents of scientific papers with SDG themes is important for monitoring SDG actions in a country, including Indonesia. Due to the increased volume of scientific papers, manually assigning the related categories of SDGs to each paper will take much effort. Leveraging transfer learning techniques, specifically pre-trained language models, we propose fine-tuned models to automatically classify textual data in scientific papers to the relevant SDGs. Our approach employs a fine-tuning strategy on a Bidirectional Encoder Representations from Transformers (BERT) model for a multilabel classification task in scientific papers. Since most papers use Indonesian languages and English, we fine-tune IndoBERT trained on Indonesian corpus and mBERT trained on multilingual ones. The best classification model is used to automatically classify Indonesian scientific papers to SDGs.