Comparing DNA sequence collections by direct comparison of compressed text indexes

Anthony J. Cox, Tobias Jakobi, Giovanna Rosone, Ole B. Schulz-Trieglaff

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

Popular sequence alignment tools such as BWA convert a reference genome to an indexing data structure based on the Burrows-Wheeler Transform (BWT), from which matches to individual query sequences can be rapidly determined. However the utility of also indexing the query sequences themselves remains relatively unexplored. Here we show that an all-against-all comparison of two sequence collections can be computed from the BWT of each collection with the BWTs held entirely in external memory, i.e. on disk and not in RAM. As an application of this technique, we show that BWTs of transcriptomic and genomic reads can be compared to obtain reference-free predictions of splice junctions that have high overlap with results from more standard reference-based methods. Code to construct and compare the BWT of large genomic data sets is available at http://beetl.github.com/BEETL/ as part of the BEETL library.

Original languageEnglish (US)
Title of host publicationAlgorithms in Bioinformatics - 12th International Workshop, WABI 2012, Proceedings
Pages214-224
Number of pages11
DOIs
StatePublished - 2012
Externally publishedYes
Event12th International Workshop on Algorithms in Bioinformatics, WABI 2012 - Ljubljana, Slovenia
Duration: Sep 10 2012Sep 12 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7534 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other12th International Workshop on Algorithms in Bioinformatics, WABI 2012
Country/TerritorySlovenia
CityLjubljana
Period9/10/129/12/12

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Comparing DNA sequence collections by direct comparison of compressed text indexes'. Together they form a unique fingerprint.

Cite this