TY - GEN
T1 - Learning models for aligning protein sequences with predicted secondary structure
AU - Kim, Eagu
AU - Wheeler, Travis
AU - Kececioglu, John
PY - 2009
Y1 - 2009
N2 - Accurately aligning distant protein sequences is notoriously difficult. A recent approach to improving alignment accuracy is to use additional information such as predicted secondary structure. We introduce several new models for scoring alignments of protein sequences with predicted secondary structure, which use the predictions and their confidences to modify both the substitution and gap cost functions. We present efficient algorithms for computing optimal pairwise alignments under these models, all of which run in near-quadratic time. We also review an approach to learning the values of the parameters in these models called inverse alignment. We then evaluate the accuracy of these models by studying how well an optimal alignment under the model recovers known benchmark reference alignments. Our experiments show that using parameters learned by inverse alignment, these new secondarystructure-based models provide a significant improvement in alignment accuracy for distant sequences. The best model improves upon the accuracy of the standard sequence alignment model for pairwise alignment by as much as 15% for sequences with less than 25% identity, and improves the accuracy of multiple alignment by 20% for difficult benchmarks whose average accuracy under standard tools is less than 40%.
AB - Accurately aligning distant protein sequences is notoriously difficult. A recent approach to improving alignment accuracy is to use additional information such as predicted secondary structure. We introduce several new models for scoring alignments of protein sequences with predicted secondary structure, which use the predictions and their confidences to modify both the substitution and gap cost functions. We present efficient algorithms for computing optimal pairwise alignments under these models, all of which run in near-quadratic time. We also review an approach to learning the values of the parameters in these models called inverse alignment. We then evaluate the accuracy of these models by studying how well an optimal alignment under the model recovers known benchmark reference alignments. Our experiments show that using parameters learned by inverse alignment, these new secondarystructure-based models provide a significant improvement in alignment accuracy for distant sequences. The best model improves upon the accuracy of the standard sequence alignment model for pairwise alignment by as much as 15% for sequences with less than 25% identity, and improves the accuracy of multiple alignment by 20% for difficult benchmarks whose average accuracy under standard tools is less than 40%.
UR - http://www.scopus.com/inward/record.url?scp=67650296442&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650296442&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-02008-7_36
DO - 10.1007/978-3-642-02008-7_36
M3 - Conference contribution
AN - SCOPUS:67650296442
SN - 9783642020070
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 512
EP - 531
BT - Research in Computational Molecular Biology - 13th Annual International Conference, RECOMB 2009, Proceedings
T2 - 13th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2009
Y2 - 18 May 2009 through 21 May 2009
ER -