We create optimal decision trees for classification tasks. We optimize an objective that includes a penalty on the number of leaves in a tree to favor models that are sparse and interpretable. Sparse decision trees function as an interpretable model with greater flexibility and power than rule lists (the latter being a special case of the former).
Inspired by our success in producing certifiably optimal rule lists, we turn to the much harder problem of finding optimal decision trees. Decision tree algorithms have been among the most popular algorithms for interpretable machine learning since the early 1980's. We address the problem that has plagued decision tree algorithms since their inception: their lack of certified optimality. Decision tree algorithms are often greedy or myopic and sometimes produce unquestionably suboptimal models.
OSDT is the first practical algorithm for finding Optimal Sparse Decision Trees for binary variables. The algorithm combines strong analytical bounds that reduce the search space with custom data structures and computational caching to produce performant algorithms to find optimal decision trees. As has been done in the context of rules lists, our objective incorporates a penalty on the number of leaves in a tree, favoring sparse and accurate models.
GOSDT builds on OSDT, incorporating two natural extensions: a collection of additional objective functions and a novel dynamic programming with bounds algorithm.
First, we add support for additional loss functions, such as weighted accuracy and balanced accuracy, as well as more complex objectives that quantify the optimality of decision boundaries, e.g., AUC (Area Under the ROC Curve).
Second, decision trees, when optimized over additive losses, naturally lend themselves to a dynamic programming approach. Any node in an optimal decision tree is itself the root of a decision tree that is optimal for the data points that pass through that node. Leveraging this fact, we create a dynamic programming approach (with bounds) to enable parallelism and reduce the runtime and memory usage of our decision tree algorithm.
While GOSDT makes it possible to find optimal decision trees, its computation time on data sets with many features can be large. GHOUL introduces smart guessing strategies that leverage black box models to reduce the computatoin time by multiple orders of magnitude, while providing bounds on how far the resulting trees can deviate from the black box's accuracy and expressiveness.
Our technique enables guesses about how to bin continuous features, the depth of the tree, and lower bounds on the error for the optimal decision tree.
Systopia lab is supported by a number of government and industrial sources, including Cisco Systems, the Communications Security Establishment Canada, Intel Research, the National Sciences and Engineering Research Council of Canada (NSERC), Network Appliance, Office of the Privacy Commissioner of Canada, and the National Science Foundation (NSF).