Accuracy estimation and parameter advising for protein multiple sequence alignment

John Kececioglu, Dan Deblasio

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure nonlocal properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond linear combinations of features, and (c) develops new regression formulations for learning an estimator from examples; in addition, for parameter advising, we (d) determine the optimal parameter set of a given cardinality, which specifies the best parameter values from which to choose. Our estimator, which we call Facet (for "feature-based accuracy estimator"), yields a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality.

Original languageEnglish (US)
Pages (from-to)259-279
Number of pages21
JournalJournal of Computational Biology
Volume20
Issue number4
DOIs
StatePublished - Apr 1 2013

Keywords

  • Accuracy assessment
  • Feature functions.
  • Machine learning
  • Parameter choice
  • Sequence alignment

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Accuracy estimation and parameter advising for protein multiple sequence alignment'. Together they form a unique fingerprint.

Cite this