Abstract
This paper is concerned with computationally efficient learning of homogeneous sparse halfspaces in Rd under noise. Though recent works have established attribute-efficient learning algorithms under various types of label noise (e.g. bounded noise), it remains an open question of when and how s-sparse halfspaces can be efficiently learned under the challenging malicious noise model, where an adversary may corrupt both the unlabeled examples and the labels. We answer this question in the affirmative by designing a computationally efficient active learning algorithm with near-optimal label complexity of Õ(s log4 dε)1 and noise tolerance η = Ω(ε), where ε ∈ (0, 1) is the target error rate, under the assumption that the distribution over (uncorrupted) unlabeled examples is isotropic log-concave. Our algorithm can be straightforwardly tailored to the passive learning setting, and we show that its sample complexity is Õ(1ε s2 log5 d) which also enjoys attribute efficiency. Our main techniques include attribute-efficient paradigms for soft outlier removal and for empirical risk minimization, and a new analysis of uniform concentration for unbounded instances – all of them crucially take the sparsity structure of the underlying halfspace into account.
Original language | English (US) |
---|---|
Pages (from-to) | 1072-1113 |
Number of pages | 42 |
Journal | Proceedings of Machine Learning Research |
Volume | 132 |
State | Published - 2021 |
Event | 32nd International Conference on Algorithmic Learning Theory, ALT 2021 - Virtual, Online Duration: Mar 16 2021 → Mar 19 2021 |
Keywords
- attribute efficiency
- halfspaces
- malicious noise
- passive and active learning
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability