TY - JOUR
T1 - Approximation algorithms for multiple sequence alignment under a fixed evolutionary tree
AU - Ravi, R.
AU - Kececioglu, John D.
N1 - Funding Information:
* Dedicated to the memory of Eugene Lawler. An earlier version of this paper was presented at the 5th Symposium on Combinatorial Pattern Matching [ 141. * Corresponding author. ’ Research supported by a DOE Human Genome Distinguished Postdoctoral Fellowship.
PY - 1998/11/9
Y1 - 1998/11/9
N2 - We consider the problem of multiple sequence alignment under a fixed evolutionary tree: given a tree whose leaves are labeled by sequences, find ancestral sequences to label its internal nodes so as to minimize the total length of the tree, where the length of an edge is the edit distance between the sequences labeling its endpoints. We present a new polynomial-time approximation algorithm for this problem, and analyze its performance on regular d-ary trees with d a constant. On such a tree, the algorithm finds a solution within a factor (d + 1)/(d - 1) of the minimum in Q(kdT(d, n)+k2dn2) time, where k is the number of leaves in the tree, n is the length of the longest sequence labeling a leaf, and T(d, n) is the time to compute a Steiner point for d sequences of length at most n. (A Steiner point for a set script capital L sign of sequences is a sequence P that minimizes the sum of the edit distances from P to each sequence in script capital L sign. The time T(d, n) is O(d2dnd), given O(dsd+l)-time preprocessing for an alphabet of size s.) The approximation algorithm is conceptually simple and easy to implement, and actually applies to any metric space in which a Steiner point for any fixed-sized set can be computed in polynomial time. We also introduce a new problem, bottleneck tree-alignment, in which the objective is to label the internal nodes of the tree so as to minimize the length of the longest edge. We describe an exponential-time exact algorithm for the case of unit-cost edit operations, and show there is a simple linear-time approximation algorithm for the general case that finds a solution within a factor O(log k) of the minimum. 1998 Published by Elsevier Science B.V. All rights reserved.
AB - We consider the problem of multiple sequence alignment under a fixed evolutionary tree: given a tree whose leaves are labeled by sequences, find ancestral sequences to label its internal nodes so as to minimize the total length of the tree, where the length of an edge is the edit distance between the sequences labeling its endpoints. We present a new polynomial-time approximation algorithm for this problem, and analyze its performance on regular d-ary trees with d a constant. On such a tree, the algorithm finds a solution within a factor (d + 1)/(d - 1) of the minimum in Q(kdT(d, n)+k2dn2) time, where k is the number of leaves in the tree, n is the length of the longest sequence labeling a leaf, and T(d, n) is the time to compute a Steiner point for d sequences of length at most n. (A Steiner point for a set script capital L sign of sequences is a sequence P that minimizes the sum of the edit distances from P to each sequence in script capital L sign. The time T(d, n) is O(d2dnd), given O(dsd+l)-time preprocessing for an alphabet of size s.) The approximation algorithm is conceptually simple and easy to implement, and actually applies to any metric space in which a Steiner point for any fixed-sized set can be computed in polynomial time. We also introduce a new problem, bottleneck tree-alignment, in which the objective is to label the internal nodes of the tree so as to minimize the length of the longest edge. We describe an exponential-time exact algorithm for the case of unit-cost edit operations, and show there is a simple linear-time approximation algorithm for the general case that finds a solution within a factor O(log k) of the minimum. 1998 Published by Elsevier Science B.V. All rights reserved.
KW - Approximation algorithms
KW - Computational biology
KW - Evolutionary trees
KW - Multiple sequence alignment
UR - http://www.scopus.com/inward/record.url?scp=0013695811&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0013695811&partnerID=8YFLogxK
U2 - 10.1016/S0166-218X(98)00079-1
DO - 10.1016/S0166-218X(98)00079-1
M3 - Article
AN - SCOPUS:0013695811
SN - 0166-218X
VL - 88
SP - 355
EP - 366
JO - Discrete Applied Mathematics
JF - Discrete Applied Mathematics
IS - 1-3
ER -