Abstract
Phylogeneticists often design their studies to maximize the number of genes included but minimize the overall amount of missing data. However, few studies have addressed the costs and benefits of adding characters with missing data, especially for likelihood analyses of multiple loci. In this paper, we address this topic using two empirical data sets (in yeast and plants) with well-resolved phylogenies. We introduce varying amounts of missing data into varying numbers of genes and test whether the benefits of excluding genes with missing data outweigh the costs of excluding the non-missing data that are associated with them. We also test if there is a proportion of missing data in the incomplete genes at which they cease to be beneficial or harmful, and whether missing data consistently bias branch length estimates.
Original language | English (US) |
---|---|
Pages (from-to) | 308-318 |
Number of pages | 11 |
Journal | Molecular Phylogenetics and Evolution |
Volume | 80 |
DOIs | |
State | Published - Nov 1 2014 |
Keywords
- Accuracy
- Maximum likelihood
- Missing data
- Phylogeny
ASJC Scopus subject areas
- Ecology, Evolution, Behavior and Systematics
- Molecular Biology
- Genetics