Skip to main navigation Skip to search Skip to main content

Sensitive and error-tolerant annotation of protein-coding DNA with BATH

Research output: Contribution to journalArticlepeer-review

Abstract

We present BATH, a tool for highly sensitive annotation of protein-coding DNA based on direct alignment of that DNA to a database of protein sequences or profile hidden Markov models (pHMMs). BATH is built on top of the HMMER3 code base, and simplifies the annotation workflow for pHMM-based translated sequence annotation by providing a straightforward input interface and easy-to-interpret output. BATH also introduces novel frameshift-aware algorithms to detect frameshift-inducing nucleotide insertions and deletions (indels). BATH matches the accuracy of HMMER3 for annotation of sequences containing no errors, and produces superior accuracy to all tested tools for annotation of sequences containing nucleotide indels. These results suggest that BATH should be used when high annotation sensitivity is required, particularly when frameshift errors are expected to interrupt protein-coding regions, as is true with long-read sequencing data and in the context of pseudogenes.

Original languageEnglish (US)
Article numbervbae088
JournalBioinformatics Advances
Volume4
Issue number1
DOIs
StatePublished - 2024
Externally publishedYes

ASJC Scopus subject areas

  • Structural Biology
  • Molecular Biology
  • Genetics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Sensitive and error-tolerant annotation of protein-coding DNA with BATH'. Together they form a unique fingerprint.

Cite this