The small-world dynamics of tree networks and data mining in phyloinformatics

William H. Piel, Michael J. Sanderson, Michael J. Donoghue

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

Motivation: A noble and ultimate objective of phyloinformatic research is to assemble, synthesize, and explore the evolutionary history of life on earth. Data mining methods for performing these tasks are not yet well developed, but one avenue of research suggests that network connectivity dynamics will play an important role in future methods. Analysis of disordered networks, such as small-world networks, has applications as diverse as disease propagation, collaborative networks, and power grids. Here we apply similar analyses to networks of phylogenetic trees in order to understand how synthetic information can emerge from a database of phylogenies. Results: Analyses of tree network connectivity in Tree-BASE show that a collection of phylogenetic trees behaves as a small-world network-while on the one hand the trees are clustered, like a non-random lattice, on the other hand they have short characteristic path lengths, like a random graph. Tree connectivities follow a dual-scale power-law distribution (first power-law exponent ≈1.87; second ≈4.82). This unusual pattern is due, in part, to the presence of alternative tree topologies that enter the database with each published study. As expected, small collections of trees decrease connectivity as new trees are added, while large collections of trees increase connectivity. However, the inflection point is surprisingly low: after about 600 trees the network suddenly jumps to a higher level of coherence. More stringent definitions of 'neighbour' greatly delay the threshold whence a database achieves sufficient maturity for a coherent network to emerge. However, more stringent definitions of 'neighbour' would also likely show improved focus in data mining.

Original languageEnglish (US)
Pages (from-to)1162-1168
Number of pages7
JournalBioinformatics
Volume19
Issue number9
DOIs
StatePublished - Jun 12 2003
Externally publishedYes

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'The small-world dynamics of tree networks and data mining in phyloinformatics'. Together they form a unique fingerprint.

Cite this