TY - GEN
T1 - Profile guided MPI protocol selection for point-to-point communication calls
AU - Marathe, Aniruddha
AU - Lowenthal, David K.
AU - Gu, Zheng
AU - Small, Matthew
AU - Yuan, Xin
PY - 2011
Y1 - 2011
N2 - Improving communication performance is critical to achieving high performance in message-passing programs. Designing new, efficient protocols to realize point-to-point and collective communication operations has therefore been an active area of research. However, the best protocol for a given communication routine is both application and architecture specific. This paper contributes a new method of selection of the optimal protocol for a given point-to-point communication pair. Our technique analyzes the MPI communication call profile of an application and uses a computation and communication model we have developed to choose the proper protocol for each communication phase. We have applied our system to MPI applications such as CG, Sweep3D and Sparse Matrix multiplication, as well as synthetic applications. Our scheme yields an improvement in total execution time of up to 20% compared to MVAPICH2 and up to 3.2% compared to the best, highly optimized communication protocol for the real applications. Furthermore, experiments on the synthetic applications show that the savings can be much more pronounced.
AB - Improving communication performance is critical to achieving high performance in message-passing programs. Designing new, efficient protocols to realize point-to-point and collective communication operations has therefore been an active area of research. However, the best protocol for a given communication routine is both application and architecture specific. This paper contributes a new method of selection of the optimal protocol for a given point-to-point communication pair. Our technique analyzes the MPI communication call profile of an application and uses a computation and communication model we have developed to choose the proper protocol for each communication phase. We have applied our system to MPI applications such as CG, Sweep3D and Sparse Matrix multiplication, as well as synthetic applications. Our scheme yields an improvement in total execution time of up to 20% compared to MVAPICH2 and up to 3.2% compared to the best, highly optimized communication protocol for the real applications. Furthermore, experiments on the synthetic applications show that the savings can be much more pronounced.
KW - MPI
KW - Point-to-point communication
KW - Protocol selection
UR - http://www.scopus.com/inward/record.url?scp=83455166564&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=83455166564&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2011.215
DO - 10.1109/IPDPS.2011.215
M3 - Conference contribution
AN - SCOPUS:83455166564
SN - 9780769543857
T3 - IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
SP - 733
EP - 739
BT - 2011 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2011
T2 - 25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011
Y2 - 16 May 2011 through 20 May 2011
ER -