TY - GEN
T1 - The Influence of Dimensions on the Complexity of Computing Decision Trees
AU - Kobourov, Stephen G.
AU - Löffler, Maarten
AU - Montecchiani, Fabrizio
AU - Pilipczuk, Marcin
AU - Rutter, Ignaz
AU - Seidel, Raimund
AU - Sorge, Manuel
AU - Wulms, Jules
N1 - Publisher Copyright:
Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2023/6/27
Y1 - 2023/6/27
N2 - A decision tree recursively splits a feature space Rd and then assigns class labels based on the resulting partition. Decision trees have been part of the basic machine-learning toolkit for decades. A large body of work considers heuristic algorithms that compute a decision tree from training data, usually aiming to minimize in particular the size of the resulting tree. In contrast, little is known about the complexity of the underlying computational problem of computing a minimum-size tree for the given training data. We study this problem with respect to the number d of dimensions of the feature space Rd, which contains n training examples. We show that it can be solved in O(n2d+1) time, but under reasonable complexity-theoretic assumptions it is not possible to achieve f(d) · no(d/ log d) running time. The problem is solvable in (dR)O(dR) · n1+o(1) time, if there are exactly two classes and R is an upper bound on the number of tree leaves labeled with the first class.
AB - A decision tree recursively splits a feature space Rd and then assigns class labels based on the resulting partition. Decision trees have been part of the basic machine-learning toolkit for decades. A large body of work considers heuristic algorithms that compute a decision tree from training data, usually aiming to minimize in particular the size of the resulting tree. In contrast, little is known about the complexity of the underlying computational problem of computing a minimum-size tree for the given training data. We study this problem with respect to the number d of dimensions of the feature space Rd, which contains n training examples. We show that it can be solved in O(n2d+1) time, but under reasonable complexity-theoretic assumptions it is not possible to achieve f(d) · no(d/ log d) running time. The problem is solvable in (dR)O(dR) · n1+o(1) time, if there are exactly two classes and R is an upper bound on the number of tree leaves labeled with the first class.
UR - http://www.scopus.com/inward/record.url?scp=85168252385&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85168252385&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85168252385
T3 - Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
SP - 8343
EP - 8350
BT - AAAI-23 Technical Tracks 7
A2 - Williams, Brian
A2 - Chen, Yiling
A2 - Neville, Jennifer
PB - AAAI press
T2 - 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Y2 - 7 February 2023 through 14 February 2023
ER -