Prediction of the Determinants of the Number of Spectators in the KBO League Using Optimized 'Gradient - Boost': 2023 Season
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: baseball, regression, sport
Session: CPS 21 - Applied Statistical Modelling
Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
This study aims to develop a predictive model for forecasting the number of spectators at games in the Korean Baseball Organization (KBO) League during the 2023 season, utilizing Python and Google Colab. The analysis leveraged a comprehensive dataset consisting of 720 records, encompassing game dates, team performance metrics, weather conditions, rival team schedules, and other relevant features. Preprocessing steps included handling missing values, creating dummy variables for categorical features, and scaling numerical variables using Min-Max normalization.
By leveraging this preprocessed data, a Gradient Boosting Regression (GBR) model was constructed and optimized through hyper-parameter tuning. The model exhibited robust performance, achieving an R² score of 0.7392 during cross-validation and 0.7085 on the test data, indicating its capability to explain approximately 70.85% of the variance in spectator attendance. A linear graph comparing the actual and predicted values demonstrated the model's reliability, with the two lines closely aligned.
Through feature importance analysis and SHapley Additive exPlanations (SHAP) values, the study identified critical determinants that positively influence spectator turnout, such as weekend games, matches held at the renowned Jamsil Baseball Stadium, higher rankings of the home team, and shorter travel distances for away teams. These insights offer valuable guidance for league operators and clubs to strategize promotional efforts, game scheduling, and fan engagement initiatives.
Furthermore, the study highlights the potential for incorporating time series analysis techniques and developing real-time attendance prediction models that integrate dynamic weather data and various game indicators. By harnessing a data-driven approach, this research contributes to optimizing fan participation strategies, promoting sustainable growth, and addressing the chronic financial deficits faced by KBO clubs. The findings underscore the value of leveraging machine learning and data analytics in enhancing the sporting experience and fostering a thriving professional baseball ecosystem in South Korea.