Prospects for building the tree of life from large sequence databases

Amy C. Driskell, Cécile Ané, J. Gordon Burleig, Michelle M. McMahon, Brian C. O'Meara, Michael J. Sanderson

Research output: Contribution to journalArticlepeer-review

211 Scopus citations

Abstract

We assess the phylogenetic potential of ∼300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two "supermatrices" suggests that even data sets with as much as 92% missing data can provide insights into broad sections of the tree of life.

Original languageEnglish (US)
Pages (from-to)1172-1174
Number of pages3
JournalScience
Volume306
Issue number5699
DOIs
StatePublished - Nov 12 2004
Externally publishedYes

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'Prospects for building the tree of life from large sequence databases'. Together they form a unique fingerprint.

Cite this