Training data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readouts

Ziyuan Wang, Ziyang Liu, Yinshan Fang, Hao Helen Zhang, Xiaoxiao Sun, Ning Hao, Jianwen Que, Hongxu Ding

Research output: Contribution to journalArticlepeer-review

Abstract

Accurately basecalling sequence backbones in the presence of nucleotide modifications remains a substantial challenge in nanopore sequencing bioinformatics. It has been extensively demonstrated that state-of-the-art basecallers are less compatible with modification-induced sequencing signals. A precise basecalling, on the other hand, serves as the prerequisite for virtually all the downstream analyses. Here, we report that basecallers exposed to diverse training modifications gain the generalizability to analyze novel modifications. With synthesized oligos as the model system, we precisely basecall various out-of-sample RNA modifications. From the representation learning perspective, we attribute this generalizability to basecaller representation space expanded by diverse training modifications. Taken together, we conclude increasing the training data diversity as a paradigm for building modification-tolerant nanopore sequencing basecallers.

Original languageEnglish (US)
Article number679
JournalNature communications
Volume16
Issue number1
DOIs
StatePublished - Dec 2025

ASJC Scopus subject areas

  • General Chemistry
  • General Biochemistry, Genetics and Molecular Biology
  • General Physics and Astronomy

Fingerprint

Dive into the research topics of 'Training data diversity enhances the basecalling of novel RNA modification-induced nanopore sequencing readouts'. Together they form a unique fingerprint.

Cite this