How many taxa must be sampled to identify the root node of a large clade?

Research output: Contribution to journalArticlepeer-review

28 Scopus citations


The importance of choice of taxa in phylogenetic analysis has been explored mainly with reference to its effect on the accuracy of tree estimation. Taxon sampling can also introduce other kinds of errors. Even if the sampled topology agrees with the true topology, it may not include the true root node of a clade, a node that is of interest for many reasons. Using a simple Yule model for the diversification process, the probability of identifying this node is derived under random sampling of taxa. For large clades, the minimum sample size needed to be 95% confident of identifying the root node is approximately 40 and is independent of the size of the clade. If rates of diversification differ in the two sister groups descended from the root node, the minimum sample size needed increases markedly. If these two sister groups are so different in diversity that a Yule model would be rejected by conventional diversification tests, then the necessary sample size is an order of magnitude greater than when diversification is homogeneous.

Original languageEnglish (US)
Pages (from-to)168-173
Number of pages6
JournalSystematic biology
Issue number2
StatePublished - Jun 1996


  • Branching
  • Diversification
  • Phylogeny
  • Speciation
  • Taxon sampling
  • Yule model

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics


Dive into the research topics of 'How many taxa must be sampled to identify the root node of a large clade?'. Together they form a unique fingerprint.

Cite this