Rule lists are a set of easily interpretable solutions to classification tasks. We develop custom discrete optimization techniques to use these interpretable models while remaining competitive with common black box approaches.

Try it in your browser [Interactive CORELS]

CORELS is an algorithm for constructing Certifiably Optimal RulE ListS to perform (binary) classification on a relational dataset. Rule lists are one-sided decision trees: they give a direct classification whenever a rule’s antecedent evaluates to true. You can think of them as lists composed of if-then-else statements. One rule antecedent can check for multiple features of a sample. Rule lists are useful because each step in the model’s decision making process is understandable by humans, thus ensuring transparency in decision processes. CORELS finds the optimal rule list and certifies its optimality against all feasible alternatives.

We use the discrete optimization technique of branch-and-bound to eliminate large parts of the search space and turn this into a computationally feasible problem. We use different types of bounds inherent to the rules themselves, bounds based on the current best solution, and bounds based on symmetries between rule lists. In addition, we design efficient data structures to minimize the memory usage and runtime of our algorithm on this factorially difficult problem. In comparison to black-box models such as neural networks, we show that it is possible to build machine learning models that are more interpretable to humans without sacrificing accuracy.

We are currently working on two major extensions of CORELS:

- Generalization: We are generalizing our binary classification objective to problems with more than two classes. We also want to experiment with objectives beyond empirical accuracy, taking imbalanced label distributions and different decision boundaries into account.
- Scalability: To scale to larger feature sets and continuous features without sacrificing optimality, we are working on a variety of improvements to the speed and memory consumption of our algorithm.

- Learning Certifiably Optimal Rule Lists in the Journal of Machine Learning Research (JMLR) [Link]
- Learning Certifiably Optimal Rule Lists for Categorical Data in the proceedings of the Conference on Knowledge Discovery and Data Mining (KDD) [Link]

- Undergraduate Thesis on a parallel CORELS Implementation [Link]

- Hayden McTavish
- Alexander Zheng
- Ali Behrouz
- Jacques Chen

Systopia lab is supported by a number of government and industrial sources, including Cisco Systems, the Communications Security Establishment Canada, Intel Research, the National Sciences and Engineering Research Council of Canada (NSERC), Network Appliance, Office of the Privacy Commissioner of Canada, and the National Science Foundation (NSF).