TY - GEN
T1 - Iterator-based optimization of imperfectly-nested loops
AU - Feshbach, Daniel
AU - Glaser, Mary
AU - Strout, Michelle
AU - Wonnacott, David G.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/8/3
Y1 - 2018/8/3
N2 - Effective optimization of dense array codes often depends upon the selection of the appropriate execution order for the iterations of nested loops. Tools based on the Polyhedral Model have demonstrated dramatic success in performing such optimizations on many such codes, but others remain an area of active research, leaving programmers to optimize code in other ways. Bertolacci et. al demonstrated that programmer-defined iterators can be used to explore iteration-space reorderings, and that Cray's compiler for the Chapel language can optimize such codes to be competitive with polyhedral tools. This 'iterator-based' approach allows programmers to explore iteration orderings not identified by automatic optimizers, but was only demonstrated for perfectly-nested loops, and lacked any system for warning about an iterator that would produce an incorrect result. We have now addressed these shortcomings of iterator-based loop optimization, and explored the use of our improved techniques to optimize the imperfectly-nested loops that form the core of Nussinov's algorithm for RNA secondary-structure prediction. Our C++ iterator provides performance that equals the fastest C code, several times faster than was achieved by using the same C compiler on the code with the original iteration ordering, or the code produced by the Pluto loop optimizer. Our Chapel iterators produce run-time that is competitive with the equivalent iterator-free Chapel code, though the Chapel performance does not equal that of the C/C++ code. We have also implemented an iterator that produces an incorrect-but-fast version of Nussinov's algorithm, and used this iterator to illustrate our approaches to error-detection. Manual application of our compile-time error-detection algorithm (which has yet to be integrated into a compiler) identifies this error, as does the run-time approach that we use for codes on which the static test proves inconclusive.
AB - Effective optimization of dense array codes often depends upon the selection of the appropriate execution order for the iterations of nested loops. Tools based on the Polyhedral Model have demonstrated dramatic success in performing such optimizations on many such codes, but others remain an area of active research, leaving programmers to optimize code in other ways. Bertolacci et. al demonstrated that programmer-defined iterators can be used to explore iteration-space reorderings, and that Cray's compiler for the Chapel language can optimize such codes to be competitive with polyhedral tools. This 'iterator-based' approach allows programmers to explore iteration orderings not identified by automatic optimizers, but was only demonstrated for perfectly-nested loops, and lacked any system for warning about an iterator that would produce an incorrect result. We have now addressed these shortcomings of iterator-based loop optimization, and explored the use of our improved techniques to optimize the imperfectly-nested loops that form the core of Nussinov's algorithm for RNA secondary-structure prediction. Our C++ iterator provides performance that equals the fastest C code, several times faster than was achieved by using the same C compiler on the code with the original iteration ordering, or the code produced by the Pluto loop optimizer. Our Chapel iterators produce run-time that is competitive with the equivalent iterator-free Chapel code, though the Chapel performance does not equal that of the C/C++ code. We have also implemented an iterator that produces an incorrect-but-fast version of Nussinov's algorithm, and used this iterator to illustrate our approaches to error-detection. Manual application of our compile-time error-detection algorithm (which has yet to be integrated into a compiler) identifies this error, as does the run-time approach that we use for codes on which the static test proves inconclusive.
KW - Iterator
KW - Loop transformation
KW - Optimization
UR - http://www.scopus.com/inward/record.url?scp=85052233375&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052233375&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2018.00144
DO - 10.1109/IPDPSW.2018.00144
M3 - Conference contribution
AN - SCOPUS:85052233375
SN - 9781538655559
T3 - Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
SP - 906
EP - 914
BT - Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
Y2 - 21 May 2018 through 25 May 2018
ER -