Towards Fundamental Limits for Active Multi-distribution Learning

Research output: Contribution to journalConference articlepeer-review

Abstract

Multi-distribution learning extends agnostic Probably Approximately Correct (PAC) learning to the setting in which a family of k distributions, {Di}i[k], is considered and a classifier’s performance is measured by its error under the worst distribution. This problem has attracted a lot of recent interests due to its applications in collaborative learning, fairness, and robustness. Despite a rather complete picture of sample complexity of passive multi-distribution learning, research on active multi-distribution learning remains scarce, with algorithms whose optimality remaining unknown. In this paper, we develop new algorithms for active multi-distribution learning and establish improved label complexity upper and lower bounds, in distribution-dependent and distributionfree settings. Specifically, in the near-realizable setting we prove an upper bound of (Formula Presented) in the realizable and agnostic settings respectively, where θmaxis the maximum disagreement coefficient among the k distributions, d is the VC dimension of the hypothesis class, ν is the multi-distribution error of the best hypothesis, and ε is the target excess error. Moreover, we show that the bound in the realizable setting is information theoretically optimal and that the kν/ε2term in the agnostic setting is fundamental for proper learners. We also establish instance-dependent sample complexity bound for passive multi-distribution learning that smoothly interpolates between realizable and agnostic regimes (Blum et al., 2017; Zhang et al., 2024), which may be of independent interest.

Original languageEnglish (US)
JournalProceedings of Machine Learning Research
Volume291
StatePublished - 2025
Event38th Annual Conference on Learning Theory, COLT 2025 - Lyon, France
Duration: Jun 30 2025Jul 4 2025

Keywords

  • Active Learning
  • Multi-distribution Learning
  • Sample Complexity
  • Statistical Learning Theory

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Towards Fundamental Limits for Active Multi-distribution Learning'. Together they form a unique fingerprint.

Cite this