TIES 2024

TIES 2024

Challenges calibrating machine learning models when predicting rare events, with applications to wildfire occurrence prediction

Author

NP
Nathan Phelps

Co-author

  • D
    Douglas G. Woolford
  • D
    Daniel J. Lizotte

Conference

TIES 2024

Format: IPS Abstract

Keywords: class_imbalance, spatio-temporal

Abstract

Rare event data is highly imbalanced because there are many more non-occurrences than occurrences. For example, wildfire occurrences are extremely rare—less than a tenth of percent—on a fine spatio-temporal scale. Undersampling is a common approach used when modelling such data. This involves subsampling the majority class (i.e., undersampling) to create a more balanced dataset with which to train a machine learning model. However, undersampling biases the predictions of the machine learning model, so those wanting meaningful probability estimates try to calibrate them. There are multiple ways of going about performing this calibration, including analytical calibration, whereby an equation is used to map the original predictions to new values based on the sampling rate used for the majority class, and Platt’s scaling, whereby a logistic regression model is fit to the original predictions. However, these approaches do not always work as desired. When analytical calibration is used to calibrate a random forest model, the prevalence estimates from the modelling procedure depend on both i) the sampling rate used when undersampling, and ii) the number of predictors considered at each split in the random forest. We illustrate the impact that these factors can have on the number of wildfires expected in a fire season. The use of Platt’s scaling for calibration can also lead to poor results. We demonstrate analytically that Platt’s scaling cannot properly calibrate a model fit perfectly to an undersampled dataset, but that a simple transformation of the original predictions can fix this issue. Via simulation, we show the effect that this transformation can have on obtaining better estimates. We also show that Platt’s scaling can be effective if the original predictions are miscalibrated in a particular way.