Skip to main navigation Skip to search Skip to main content

NEAR: Neural embeddings for amino acid relationships

  • Daniel Olson
  • , Thomas Colligan
  • , Daphne Demekas
  • , Jack W. Roddy
  • , Ken Youens-Clark
  • , Travis J. Wheeler

Research output: Contribution to journalArticlepeer-review

Abstract

Summary Protein language models (PLMs) have recently demonstrated potential to supplant classical protein database search methods based on sequence alignment, but are slower than common alignment-based tools and appear to be prone to a high rate of false labeling. Here, we present Neural Embeddings for Amino acid Relationships (NEAR), a method based on neural representation learning that is designed to improve both speed and accuracy of search for likely homologs in a large protein sequence database. NEAR's ResNet embedding model is trained using contrastive learning guided by trusted sequence alignments. It computes per-residue embeddings for target and query protein sequences, and identifies alignment candidates with a pipeline consisting of residue-level k-NN search and a simple neighbor aggregation scheme. Tests on a benchmark consisting of trusted remote homologs and randomly shuffled decoy sequences reveal that NEAR substantially improves accuracy relative to state-of-the-art PLMs, with lower memory requirements and faster embedding and search speed. While these results suggest that the NEAR model may be useful for standalone homology detection with increased sensitivity over standard alignment-based methods, in this manuscript, we focus on a more straightforward analysis of the model's value as a high-speed pre-filter for sensitive annotation. In that context, NEAR is at least 5x faster than the pre-filter currently used in the widely used profile hidden Markov model (pHMM) search tool HMMER3, and also outperforms the pre-filter used in our fast pHMM tool, nail. Availability and implementation NEAR is under an open-source license. Code and data curation instructions can be found at https://github.com/TravisWheelerLab/NEAR.

Original languageEnglish (US)
Pages (from-to)i449-i457
JournalBioinformatics
Volume41
DOIs
StatePublished - Jul 1 2025
Externally publishedYes

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'NEAR: Neural embeddings for amino acid relationships'. Together they form a unique fingerprint.

Cite this