Abstract
A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a straightforward extension of two-sequence alignment, we prove it is actually NP-complete. As explained in the paper, this provides the first demonstration that minimizing linear gap-costs, in the context of multiple sequence alignment, is inherently hard. We also develop an exact algorithm for Aligning Alignments that is remarkably efficient in practice, both in time and space. Even though the problem is NP-complete, computational experiments on both biological and simulated data show we can compute optimal alignments for all benchmark instances in two standard datasets, and solve very-large random instances with highly-gapped sequences.
| Original language | English (US) |
|---|---|
| Pages | 85-96 |
| Number of pages | 12 |
| DOIs | |
| State | Published - 2004 |
| Event | RECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology - San Diego, CA., United States Duration: Mar 27 2004 → Mar 31 2004 |
Other
| Other | RECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology |
|---|---|
| Country/Territory | United States |
| City | San Diego, CA. |
| Period | 3/27/04 → 3/31/04 |
Keywords
- Exact algorithms
- Linear gap costs
- Multiple sequence alignment
- Sum of pairs
ASJC Scopus subject areas
- General Computer Science
- General Biochemistry, Genetics and Molecular Biology