TY - GEN
T1 - Bit-Wise and Multi-GPU Implementations of the DNA Recombination Algorithm
AU - Tavakoli Yazdi, Elnaz
AU - Limaye, Ankur
AU - Akoglu, Ali
AU - Adegbija, Tosiron
AU - Buntzman, Adam
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - The V(D)J recombination is the primary mechanism for generating a diverse repertoire of T-cell receptors (TCRs) essential to the adaptive immune system for recognizing a wide variety of diseases. However, modeling the TCR repertoire is computationally challenging as the total number of TCRs to be generated and processed can exceed 10^18 sequences. We propose a bit-wise implementation of the V(D) J recombination algorithm, which reduces the memory footprint and execution time by factors of 4 and 2, respectively, compared to the state-of-the-art GPU implementation. We also present a multi-GPU implementation, experimentally identify suitable workload partitioning strategies for both single-and multi-GPU implementations, and, finally, expose the relationship between the workload size and limited scalability offered by the algorithm on a cluster with up to eight GPUs. We show that the bit-wise implementation reduces the execution time from 40.5 hours to 19 hours on a single GPU and 4.4 hours on an eight-GPU configuration.
AB - The V(D)J recombination is the primary mechanism for generating a diverse repertoire of T-cell receptors (TCRs) essential to the adaptive immune system for recognizing a wide variety of diseases. However, modeling the TCR repertoire is computationally challenging as the total number of TCRs to be generated and processed can exceed 10^18 sequences. We propose a bit-wise implementation of the V(D) J recombination algorithm, which reduces the memory footprint and execution time by factors of 4 and 2, respectively, compared to the state-of-the-art GPU implementation. We also present a multi-GPU implementation, experimentally identify suitable workload partitioning strategies for both single-and multi-GPU implementations, and, finally, expose the relationship between the workload size and limited scalability offered by the algorithm on a cluster with up to eight GPUs. We show that the bit-wise implementation reduces the execution time from 40.5 hours to 19 hours on a single GPU and 4.4 hours on an eight-GPU configuration.
KW - DNA recombination process
KW - Graphics Processing Unit (GPU)
KW - bit-wise implementation
KW - multi-GPU
UR - http://www.scopus.com/inward/record.url?scp=85080104245&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85080104245&partnerID=8YFLogxK
U2 - 10.1109/HiPC.2019.00026
DO - 10.1109/HiPC.2019.00026
M3 - Conference contribution
AN - SCOPUS:85080104245
T3 - Proceedings - 26th IEEE International Conference on High Performance Computing, HiPC 2019
SP - 131
EP - 140
BT - Proceedings - 26th IEEE International Conference on High Performance Computing, HiPC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th Annual IEEE International Conference on High Performance Computing, HiPC 2019
Y2 - 17 December 2019 through 20 December 2019
ER -