TY - GEN
T1 - Loop chaining
T2 - 2013 IEEE 37th Annual Computer Software and Applications Conference, COMPSAC 2013
AU - Krieger, Christopher D.
AU - Strout, Michelle Mills
AU - Olschanowsky, Catherine
AU - Stone, Andrew
AU - Guzik, Stephen
AU - Gao, Xinfeng
AU - Bertolli, Carlo
AU - Kelly, Paul H.J.
AU - Mudalige, Gihan
AU - Van Straalen, Brian
AU - Williams, Sam
PY - 2013
Y1 - 2013
N2 - There is a significant, established code base in the scientific computing community. Some of these codes have been parallelized already but are now encountering scalability issues due to poor data locality, inefficient data distributions, or load imbalance. In this work, we introduce a new abstraction called loop chaining in which a sequence of parallel and/or reduction loops that explicitly share data are grouped together into a chain. Once specified, a chain of loops can be viewed as a set of iterations under a partial ordering. This partial ordering is dictated by data dependencies that, as part of the abstraction, are exposed, thereby avoiding inter-procedural program analysis. Thus a loop chain is a partially ordered set of iterations that makes scheduling and determining data distributions across loops possible for a compiler and/or run-time system. The flexibility of being able to schedule across loops enables better management of the data locality and parallelism tradeoff. In this paper, we define the loop chaining concept and present three case studies using loop chains in scientific codes: the sparse matrix Jacobi benchmark, a domain-specific library, OP2, used in full applications with unstructured grids, and a domain-specific library, Chombo, used in full applications with structured grids. Preliminary results for the Jacobi benchmark show that a loop chain enabled optimization, full sparse tiling, results in a speedup of as much as 2.68x over a parallelized, blocked implementation on a multicore system with 40 cores.
AB - There is a significant, established code base in the scientific computing community. Some of these codes have been parallelized already but are now encountering scalability issues due to poor data locality, inefficient data distributions, or load imbalance. In this work, we introduce a new abstraction called loop chaining in which a sequence of parallel and/or reduction loops that explicitly share data are grouped together into a chain. Once specified, a chain of loops can be viewed as a set of iterations under a partial ordering. This partial ordering is dictated by data dependencies that, as part of the abstraction, are exposed, thereby avoiding inter-procedural program analysis. Thus a loop chain is a partially ordered set of iterations that makes scheduling and determining data distributions across loops possible for a compiler and/or run-time system. The flexibility of being able to schedule across loops enables better management of the data locality and parallelism tradeoff. In this paper, we define the loop chaining concept and present three case studies using loop chains in scientific codes: the sparse matrix Jacobi benchmark, a domain-specific library, OP2, used in full applications with unstructured grids, and a domain-specific library, Chombo, used in full applications with structured grids. Preliminary results for the Jacobi benchmark show that a loop chain enabled optimization, full sparse tiling, results in a speedup of as much as 2.68x over a parallelized, blocked implementation on a multicore system with 40 cores.
UR - http://www.scopus.com/inward/record.url?scp=84899746744&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84899746744&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2013.68
DO - 10.1109/IPDPSW.2013.68
M3 - Conference contribution
AN - SCOPUS:84899746744
SN - 9780769549798
T3 - Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013
SP - 375
EP - 384
BT - Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013
PB - IEEE Computer Society
Y2 - 22 July 2013 through 26 July 2013
ER -