Multiple comparisons in induction algorithms

David D. Jensen, Paul R. Cohen

    Research output: Contribution to journalArticlepeer-review

    148 Scopus citations

    Abstract

    A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a multiple comparison procedure (MCP). We analyze the statistical properties of MCPs and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation.

    Original languageEnglish (US)
    Pages (from-to)309-338
    Number of pages30
    JournalMachine Learning
    Volume38
    Issue number3
    DOIs
    StatePublished - 2000

    ASJC Scopus subject areas

    • Software
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Multiple comparisons in induction algorithms'. Together they form a unique fingerprint.

    Cite this