Download PDF

A novel approach for oblique decision trees for regression

Author

Andrea Carta

Conference

65th ISI World Statistics Congress

Format: CPS Abstract - WSC 2025

Keywords: decision-tree, machine learning, non-parametric, regression, supervised learning

Session: CPS 14 - Ordinal Data and Tree-Based Methods

Wednesday 8 October 4 p.m. - 5 p.m. (Europe/Amsterdam)

Abstract

Decision trees offer one of the most widely used supervised statistical learning methods. The primary idea behind it is to split the feature space into smaller and more homogeneous partitions in a recursive manner, according to one of the predictors at the time. Despite being vastly used, decision trees have some noticeable drawbacks, such as the fact that they cannot capture linear relationships between distinct variables. Oblique decision trees are the evolution of the axes-parallel tree, with the latter's restrictions addressed. The hyperplanes used in oblique decision trees are not always parallel to the feature axes; they might point in any direction. In fact, the hyperplane built by an oblique tree will be a linear combination of a collection of features; regarding the axes, this hyperplane will thus be oblique, thus the name. This flexibility enables oblique decision trees to represent linear and intricate feature relationships more effectively. However, its computational complexity is the main issue in choosing the best oblique hyperplane at each node. Because of this, many oblique decision tree research algorithms concentrate on heuristics for determining the optimal oblique split.
This work will present a novel approach to building oblique decision trees for regression tasks and the creation of the Tree Oblique for Regression with a weighted Support vector machine (TORS). TORS works just as a decision tree algorithm by splitting the data recursively. The major changes are only in the node-splitting algorithm. Given a matrix of p features, n observations, and a continuous target variable y, TORS first applies a variable selection algorithm by selecting only the topmost correlated features with the dependent variable y. The weighted SVM classifier will then use this subset of independent variables. However, to apply the classifier, we need to have a categorical variable, so next, y is transformed into a dichotomous variable by using a set of quantile values. We apply a weighted SVM with a linear kernel for each quantile and consequently have a hyperplane with a corresponding decrease in deviance. The hyperplane with the most deviance decrease will be then chosen as the splitting hyperplane. We do this recursively until a stopping criterion is met. A key element of this algorithm is the use of the weighted SVM. In fact, when we dichotomize the variable y, we lose information, but by giving each instance a weight proportional to the magnitude of y, we save some of it. This makes the node splitting more efficient than using a standard SVM as a classifier. The performances of TORS will be then evaluated using both artificial and real-world datasets, with a focus on its interpretability.