65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Prediction of Internet Users in Indonesia Using Google Trends Data

Author

AH
Atika Nashirah Hasyyati

Co-author

Conference

65th ISI World Statistics Congress 2025

Format: CPS Abstract - WSC 2025

Keywords: internet, machine learning

Session: CPS 83 - Predictive Analytics and Nowcasting

Monday 6 October 5:10 p.m. - 6:10 p.m. (Europe/Amsterdam)

Abstract

Statistics Indonesia produces the percentage of internet users in Indonesia based on the Indonesian National Socioeconomic Survey (Susenas) annually. One of the impacts of the COVID-19 pandemic is limited access to respondents, so The Partnership on Measuring Information and Communication Technology for Development reported the need to promote data innovations as the complement of ICT traditional data. Internet use is one of the most important ICT variables that can describe the gaps in connectivity and technological advancement. Meanwhile, the current number of internet users can only be published annually so there is a need for more timely estimates. Google Trends data is free to access and timely available (nowcasting) that can be used to produce data and prediction. This paper aims at using Google Trends as an alternative data source to produce monthly estimates of the percentage of individuals using the internet in Indonesia by province. In this case, we also utilise the official data (National Socioeconomic Survey) of the percentage of internet users in Indonesia by province from 2014 to 2021. Using some variables as predictors based on Google Trends data (based on Web Search, possible search keywords are Facebook, WhatsApp, etc., including top searches). Then, applying machine learning methods to predict the percentage of internet users. Several challenges need to be overcome when conducting data gathering from the Google Trends API. Over 1.6 million data was collected by using gtrendsR package. After data cleaning, some machine learning methods were compared to produce monthly estimates of internet users. Comparison of some machine learning methods show that XGBoost has the highest accuracy.