TY - GEN
T1 - Parallelization of the NAS conjugate gradient benchmark using the global arrays shared memory programming model
AU - Zhang, Yeliang
AU - Tipparaju, Vinod
AU - Nieplocha, Jarek
AU - Hariri, Salim
PY - 2005
Y1 - 2005
N2 - The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in the context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.
AB - The NAS Conjugate Gradient (CG) benchmark is an important scientific kernel used to evaluate machine performance and compare characteristics of different programming models. Global Arrays (GA) toolkit supports a shared memory programming paradigm and offers the programmer control over the distribution and locality that are important for optimizing performance on scalable architectures. In this paper, we describe and compare two different parallelization strategies of the CG benchmark using GA and report performance results on a shared-memory system as well as on a cluster. Performance benefits of using shared memory for irregular/sparse computations have been demonstrated before in the context of the CG benchmark using OpenMP. Similarly, the GA implementation outperforms the standard MPI implementation on shared memory system, in our case the SGI Altix. However, with GA these benefits are extended to distributed memory systems and demonstrated on a Linux cluster with Myrinet.
UR - http://www.scopus.com/inward/record.url?scp=33746318153&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33746318153&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2005.331
DO - 10.1109/IPDPS.2005.331
M3 - Conference contribution
AN - SCOPUS:33746318153
SN - 0769523129
SN - 0769523129
SN - 9780769523125
T3 - Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005
BT - Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005
T2 - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005
Y2 - 4 April 2005 through 8 April 2005
ER -