TY - GEN
T1 - Sequence alignment with GPU
T2 - 23rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2009
AU - Striemer, Gregory M.
AU - Akoglu, Ali
PY - 2009
Y1 - 2009
N2 - In bioinformatics, alignments are commonly performed in genome and protein sequence analysis for gene identification and evolutionary similarities. There are several approaches for such analysis, each varying in accuracy and computational complexity. Smith-Waterman (SW) is by far the best algorithm for its accuracy in similarity scoring. However, execution time of this algorithm on general purpose processor based systems makes it impractical for use by life scientists. In this paper we take Smith-Waterman as a case study to explore the architectural features of Graphics Processing Units (GPUs) and evaluate the challenges the hardware architecture poses, as well as the software modifications needed to map the program architecture on to the GPU. We achieve a 23x speedup against the serial version of the SW algorithm. We further study the effect of memory organization and the instruction set architecture on GPU performance. For that purpose we analyze another implementation on an Intel Quad Core processor that makes use of Intel's SIMD based SSE2 architecture. We show that if reading blocks of 16 words at a time instead of 4 is allowed, and if 64KB of shared memory as opposed to 16KB is available to the programmer, GPU performance enhances significantly making it comparable to the SIMD based implementation. We quantify these observations to illustrate the need for studies on extending the instruction set and memory organization for the GPU.
AB - In bioinformatics, alignments are commonly performed in genome and protein sequence analysis for gene identification and evolutionary similarities. There are several approaches for such analysis, each varying in accuracy and computational complexity. Smith-Waterman (SW) is by far the best algorithm for its accuracy in similarity scoring. However, execution time of this algorithm on general purpose processor based systems makes it impractical for use by life scientists. In this paper we take Smith-Waterman as a case study to explore the architectural features of Graphics Processing Units (GPUs) and evaluate the challenges the hardware architecture poses, as well as the software modifications needed to map the program architecture on to the GPU. We achieve a 23x speedup against the serial version of the SW algorithm. We further study the effect of memory organization and the instruction set architecture on GPU performance. For that purpose we analyze another implementation on an Intel Quad Core processor that makes use of Intel's SIMD based SSE2 architecture. We show that if reading blocks of 16 words at a time instead of 4 is allowed, and if 64KB of shared memory as opposed to 16KB is available to the programmer, GPU performance enhances significantly making it comparable to the SIMD based implementation. We quantify these observations to illustrate the need for studies on extending the instruction set and memory organization for the GPU.
KW - Alignment
KW - Graphics processing unit
KW - Residue
KW - Smith-waterman
UR - http://www.scopus.com/inward/record.url?scp=70449844290&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70449844290&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2009.5161066
DO - 10.1109/IPDPS.2009.5161066
M3 - Conference contribution
AN - SCOPUS:70449844290
SN - 9781424437504
T3 - IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium
BT - IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium
Y2 - 23 May 2009 through 29 May 2009
ER -