Abstract
Time-calibrated phylogenies have become essential to evolutionary biology. A recurrent and unresolved question for dating analyses is whether genes with missing data cells should be included or excluded. This issue is particularly unclear for the most widely used dating method, the uncorrelated lognormal approach implemented in BEAST. Here, we test the robustness of this method to missing data. We compare divergence-time estimates from a nearly complete dataset (20 nuclear genes for 32 species of squamate reptiles) to those from subsampled matrices, including those with 5 or 2 complete loci only and those with 5 or 8 incomplete loci added. In general, missing data had little impact on estimated dates (mean error of ~5. Myr per node or less, given an overall age of ~220. Myr in squamates), even when 80% of sampled genes had 75% missing data. Mean errors were somewhat higher when all genes were 75% incomplete (~17. Myr). However, errors increased dramatically when only 2 of 9 fossil calibration points were included (~40. Myr), regardless of missing data. Overall, missing data (and even numbers of genes sampled) may have only minor impacts on the accuracy of divergence dating with BEAST, relative to the dramatic effects of fossil calibrations.
Original language | English (US) |
---|---|
Pages (from-to) | 41-49 |
Number of pages | 9 |
Journal | Molecular Phylogenetics and Evolution |
Volume | 85 |
DOIs | |
State | Published - Apr 1 2015 |
Keywords
- Accuracy
- BEAST
- Divergence dating
- Fossil calibration
- Missing data
- Relaxed clock
ASJC Scopus subject areas
- Ecology, Evolution, Behavior and Systematics
- Molecular Biology
- Genetics