Do missing data influence the accuracy of divergence-time estimation with BEAST?

Yuchi Zheng, John J. Wiens

Research output: Contribution to journalArticlepeer-review

65 Scopus citations


Time-calibrated phylogenies have become essential to evolutionary biology. A recurrent and unresolved question for dating analyses is whether genes with missing data cells should be included or excluded. This issue is particularly unclear for the most widely used dating method, the uncorrelated lognormal approach implemented in BEAST. Here, we test the robustness of this method to missing data. We compare divergence-time estimates from a nearly complete dataset (20 nuclear genes for 32 species of squamate reptiles) to those from subsampled matrices, including those with 5 or 2 complete loci only and those with 5 or 8 incomplete loci added. In general, missing data had little impact on estimated dates (mean error of ~5. Myr per node or less, given an overall age of ~220. Myr in squamates), even when 80% of sampled genes had 75% missing data. Mean errors were somewhat higher when all genes were 75% incomplete (~17. Myr). However, errors increased dramatically when only 2 of 9 fossil calibration points were included (~40. Myr), regardless of missing data. Overall, missing data (and even numbers of genes sampled) may have only minor impacts on the accuracy of divergence dating with BEAST, relative to the dramatic effects of fossil calibrations.

Original languageEnglish (US)
Pages (from-to)41-49
Number of pages9
JournalMolecular Phylogenetics and Evolution
StatePublished - Apr 1 2015


  • Accuracy
  • Divergence dating
  • Fossil calibration
  • Missing data
  • Relaxed clock

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics


Dive into the research topics of 'Do missing data influence the accuracy of divergence-time estimation with BEAST?'. Together they form a unique fingerprint.

Cite this