TY - GEN
T1 - Extending OpenMP to facilitate loop optimization
AU - Bertolacci, Ian
AU - Strout, Michelle Mills
AU - de Supinski, Bronis R.
AU - Scogland, Thomas R.W.
AU - Davis, Eddie C.
AU - Olschanowsky, Catherine
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - OpenMP provides several mechanisms to specify parallel source-code transformations. Unfortunately, many compilers perform these transformations early in the translation process, often before performing traditional sequential optimizations, which can limit the effectiveness of those optimizations. Further, OpenMP semantics preclude performing those transformations in some cases prior to the parallel transformations, which can limit overall application performance. In this paper, we propose extensions to OpenMP that require the application of traditional sequential loop optimizations. These extensions can be specified to apply before, as well as after, other OpenMP loop transformations. We discuss limitations implied by existing OpenMP constructs as well as some previously proposed (parallel) extensions to OpenMP that could benefit from constructs that explicitly apply sequential loop optimizations. We present results that explore how these capabilities can lead to as much as a 20% improvement in parallel loop performance by applying common sequential loop optimizations.
AB - OpenMP provides several mechanisms to specify parallel source-code transformations. Unfortunately, many compilers perform these transformations early in the translation process, often before performing traditional sequential optimizations, which can limit the effectiveness of those optimizations. Further, OpenMP semantics preclude performing those transformations in some cases prior to the parallel transformations, which can limit overall application performance. In this paper, we propose extensions to OpenMP that require the application of traditional sequential loop optimizations. These extensions can be specified to apply before, as well as after, other OpenMP loop transformations. We discuss limitations implied by existing OpenMP constructs as well as some previously proposed (parallel) extensions to OpenMP that could benefit from constructs that explicitly apply sequential loop optimizations. We present results that explore how these capabilities can lead to as much as a 20% improvement in parallel loop performance by applying common sequential loop optimizations.
KW - Heterogeneous adaptive worksharing
KW - Loop chain abstraction
KW - Loop optimization
KW - Memory transfer pipelining
UR - http://www.scopus.com/inward/record.url?scp=85057228654&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057228654&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-98521-3_4
DO - 10.1007/978-3-319-98521-3_4
M3 - Conference contribution
AN - SCOPUS:85057228654
SN - 9783319985206
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 53
EP - 65
BT - Evolving OpenMP for Evolving Architectures - 14th International Workshop on OpenMP, IWOMP 2018, Proceedings
A2 - Valero-Lara, Pedro
A2 - Bellido, Sergi Mateo
A2 - Martorell, Xavier
A2 - Labarta, Jesus
A2 - de Supinski, Bronis R.
PB - Springer-Verlag
T2 - 14th International Workshop on OpenMP, IWOMP 2018
Y2 - 26 September 2018 through 28 September 2018
ER -