TY - GEN
T1 - Comparing DNA sequence collections by direct comparison of compressed text indexes
AU - Cox, Anthony J.
AU - Jakobi, Tobias
AU - Rosone, Giovanna
AU - Schulz-Trieglaff, Ole B.
PY - 2012
Y1 - 2012
N2 - Popular sequence alignment tools such as BWA convert a reference genome to an indexing data structure based on the Burrows-Wheeler Transform (BWT), from which matches to individual query sequences can be rapidly determined. However the utility of also indexing the query sequences themselves remains relatively unexplored. Here we show that an all-against-all comparison of two sequence collections can be computed from the BWT of each collection with the BWTs held entirely in external memory, i.e. on disk and not in RAM. As an application of this technique, we show that BWTs of transcriptomic and genomic reads can be compared to obtain reference-free predictions of splice junctions that have high overlap with results from more standard reference-based methods. Code to construct and compare the BWT of large genomic data sets is available at http://beetl.github.com/BEETL/ as part of the BEETL library.
AB - Popular sequence alignment tools such as BWA convert a reference genome to an indexing data structure based on the Burrows-Wheeler Transform (BWT), from which matches to individual query sequences can be rapidly determined. However the utility of also indexing the query sequences themselves remains relatively unexplored. Here we show that an all-against-all comparison of two sequence collections can be computed from the BWT of each collection with the BWTs held entirely in external memory, i.e. on disk and not in RAM. As an application of this technique, we show that BWTs of transcriptomic and genomic reads can be compared to obtain reference-free predictions of splice junctions that have high overlap with results from more standard reference-based methods. Code to construct and compare the BWT of large genomic data sets is available at http://beetl.github.com/BEETL/ as part of the BEETL library.
UR - http://www.scopus.com/inward/record.url?scp=84866722538&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866722538&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-33122-0_17
DO - 10.1007/978-3-642-33122-0_17
M3 - Conference contribution
AN - SCOPUS:84866722538
SN - 9783642331213
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 214
EP - 224
BT - Algorithms in Bioinformatics - 12th International Workshop, WABI 2012, Proceedings
T2 - 12th International Workshop on Algorithms in Bioinformatics, WABI 2012
Y2 - 10 September 2012 through 12 September 2012
ER -