Should genes with missing data be excluded from phylogenetic analyses?

Wei Jiang, Si Yun Chen, Hong Wang, De Zhu Li, John J. Wiens

Research output: Contribution to journalArticlepeer-review

96 Scopus citations


Phylogeneticists often design their studies to maximize the number of genes included but minimize the overall amount of missing data. However, few studies have addressed the costs and benefits of adding characters with missing data, especially for likelihood analyses of multiple loci. In this paper, we address this topic using two empirical data sets (in yeast and plants) with well-resolved phylogenies. We introduce varying amounts of missing data into varying numbers of genes and test whether the benefits of excluding genes with missing data outweigh the costs of excluding the non-missing data that are associated with them. We also test if there is a proportion of missing data in the incomplete genes at which they cease to be beneficial or harmful, and whether missing data consistently bias branch length estimates.

Original languageEnglish (US)
Pages (from-to)308-318
Number of pages11
JournalMolecular Phylogenetics and Evolution
StatePublished - Nov 1 2014


  • Accuracy
  • Maximum likelihood
  • Missing data
  • Phylogeny

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics


Dive into the research topics of 'Should genes with missing data be excluded from phylogenetic analyses?'. Together they form a unique fingerprint.

Cite this