TY - JOUR

T1 - Approximation algorithms for multiple sequence alignment under a fixed evolutionary tree

AU - Ravi, R.

AU - Kececioglu, John D.

N1 - Funding Information:
* Dedicated to the memory of Eugene Lawler. An earlier version of this paper was presented at the 5th Symposium on Combinatorial Pattern Matching [ 141. * Corresponding author. ’ Research supported by a DOE Human Genome Distinguished Postdoctoral Fellowship.

PY - 1998/11/9

Y1 - 1998/11/9

N2 - We consider the problem of multiple sequence alignment under a fixed evolutionary tree: given a tree whose leaves are labeled by sequences, find ancestral sequences to label its internal nodes so as to minimize the total length of the tree, where the length of an edge is the edit distance between the sequences labeling its endpoints. We present a new polynomial-time approximation algorithm for this problem, and analyze its performance on regular d-ary trees with d a constant. On such a tree, the algorithm finds a solution within a factor (d + 1)/(d - 1) of the minimum in Q(kdT(d, n)+k2dn2) time, where k is the number of leaves in the tree, n is the length of the longest sequence labeling a leaf, and T(d, n) is the time to compute a Steiner point for d sequences of length at most n. (A Steiner point for a set script capital L sign of sequences is a sequence P that minimizes the sum of the edit distances from P to each sequence in script capital L sign. The time T(d, n) is O(d2dnd), given O(dsd+l)-time preprocessing for an alphabet of size s.) The approximation algorithm is conceptually simple and easy to implement, and actually applies to any metric space in which a Steiner point for any fixed-sized set can be computed in polynomial time. We also introduce a new problem, bottleneck tree-alignment, in which the objective is to label the internal nodes of the tree so as to minimize the length of the longest edge. We describe an exponential-time exact algorithm for the case of unit-cost edit operations, and show there is a simple linear-time approximation algorithm for the general case that finds a solution within a factor O(log k) of the minimum. 1998 Published by Elsevier Science B.V. All rights reserved.

AB - We consider the problem of multiple sequence alignment under a fixed evolutionary tree: given a tree whose leaves are labeled by sequences, find ancestral sequences to label its internal nodes so as to minimize the total length of the tree, where the length of an edge is the edit distance between the sequences labeling its endpoints. We present a new polynomial-time approximation algorithm for this problem, and analyze its performance on regular d-ary trees with d a constant. On such a tree, the algorithm finds a solution within a factor (d + 1)/(d - 1) of the minimum in Q(kdT(d, n)+k2dn2) time, where k is the number of leaves in the tree, n is the length of the longest sequence labeling a leaf, and T(d, n) is the time to compute a Steiner point for d sequences of length at most n. (A Steiner point for a set script capital L sign of sequences is a sequence P that minimizes the sum of the edit distances from P to each sequence in script capital L sign. The time T(d, n) is O(d2dnd), given O(dsd+l)-time preprocessing for an alphabet of size s.) The approximation algorithm is conceptually simple and easy to implement, and actually applies to any metric space in which a Steiner point for any fixed-sized set can be computed in polynomial time. We also introduce a new problem, bottleneck tree-alignment, in which the objective is to label the internal nodes of the tree so as to minimize the length of the longest edge. We describe an exponential-time exact algorithm for the case of unit-cost edit operations, and show there is a simple linear-time approximation algorithm for the general case that finds a solution within a factor O(log k) of the minimum. 1998 Published by Elsevier Science B.V. All rights reserved.

KW - Approximation algorithms

KW - Computational biology

KW - Evolutionary trees

KW - Multiple sequence alignment

UR - http://www.scopus.com/inward/record.url?scp=0013695811&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0013695811&partnerID=8YFLogxK

U2 - 10.1016/S0166-218X(98)00079-1

DO - 10.1016/S0166-218X(98)00079-1

M3 - Article

AN - SCOPUS:0013695811

SN - 0166-218X

VL - 88

SP - 355

EP - 366

JO - Discrete Applied Mathematics

JF - Discrete Applied Mathematics

IS - 1-3

ER -