TY - JOUR
T1 - Extracting conflict-free information from multi-labeled trees
AU - Deepak, Akshay
AU - Fernández-Baca, David
AU - McMahon, Michelle M.
N1 - Funding Information:
This work was supported in part by National Science Foundation grant DEB-0829674. We thank Mike Sanderson for helping to motivate this work, for many discussions about the problem formulation, and for our ongoing collaboration in the STBase project. Sylvain Guillemot listened to numerous early versions of our proofs and offered many insightful comments.
PY - 2013/7/9
Y1 - 2013/7/9
N2 - Background: A multi-labeled tree, or MUL-tree, is a phylogenetic tree where two or more leaves share a label, e.g., a species name. A MUL-tree can imply multiple conflicting phylogenetic relationships for the same set of taxa, but can also contain conflict-free information that is of interest and yet is not obvious.Results: We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content. We show that any two MUL-trees with the same information content exhibit the same reduced form. This introduces an equivalence relation among MUL-trees with potential applications to comparing MUL-trees. We present an efficient algorithm to reduce a MUL-tree to its maximally reduced form and evaluate its performance on empirical datasets in terms of both quality of the reduced tree and the degree of data reduction achieved.Conclusions: Our measure of conflict-free information content based on quartets is simple and topologically appealing. In the experiments, the maximally reduced form is often much smaller than the original tree, yet retains most of the taxa. The reduction algorithm is quadratic in the number of leaves and its complexity is unaffected by the multiplicity of leaf labels or the degree of the nodes.
AB - Background: A multi-labeled tree, or MUL-tree, is a phylogenetic tree where two or more leaves share a label, e.g., a species name. A MUL-tree can imply multiple conflicting phylogenetic relationships for the same set of taxa, but can also contain conflict-free information that is of interest and yet is not obvious.Results: We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content. We show that any two MUL-trees with the same information content exhibit the same reduced form. This introduces an equivalence relation among MUL-trees with potential applications to comparing MUL-trees. We present an efficient algorithm to reduce a MUL-tree to its maximally reduced form and evaluate its performance on empirical datasets in terms of both quality of the reduced tree and the degree of data reduction achieved.Conclusions: Our measure of conflict-free information content based on quartets is simple and topologically appealing. In the experiments, the maximally reduced form is often much smaller than the original tree, yet retains most of the taxa. The reduction algorithm is quadratic in the number of leaves and its complexity is unaffected by the multiplicity of leaf labels or the degree of the nodes.
KW - Evolutionary trees
KW - Multi-labeled trees
KW - Phylogenetic trees
KW - Reduction
KW - Singly-labeled trees
UR - http://www.scopus.com/inward/record.url?scp=84880001000&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84880001000&partnerID=8YFLogxK
U2 - 10.1186/1748-7188-8-18
DO - 10.1186/1748-7188-8-18
M3 - Article
C2 - 23837994
AN - SCOPUS:84880001000
SN - 1748-7188
VL - 8
JO - Algorithms for Molecular Biology
JF - Algorithms for Molecular Biology
IS - 1
M1 - 18
ER -