TY - GEN
T1 - Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing
AU - Inadomi, Yuichi
AU - Patki, Tapasya
AU - Inoue, Koji
AU - Aoyagi, Mutsumi
AU - Rountree, Barry
AU - Schulz, Martin
AU - Lowenthal, David
AU - Wada, Yasutaka
AU - Fukazawa, Keiichiro
AU - Ueda, Masatsugu
AU - Kondo, Masaaki
AU - Miyoshi, Ikuo
N1 - Funding Information:
We extend our thanks to Livermore Computing and RIIT of Kyushu University for providing us the resources and support to conduct the large-scale power measurements presented in this paper. We also want to thank Timothy Meyer and Neha Gholkar for their initial help with gathering data on the Teller and Vulcan systems. Part of this work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-669812). This material was also based upon work supported by National Science Foundation under Grant No. 1216829. Additionally, this work was supported by the Japan Science and Technology Agency (JST) CREST program, A Power Management Framework for Post Peta-Scale Supercomputers.
Publisher Copyright:
© 2015 ACM.
PY - 2015/11/15
Y1 - 2015/11/15
N2 - A key challenge in next-generation supercomputing is to effectively schedule limited power resources. Modern processors suffer from increasingly large power variations due to the chip manufacturing process. These variations lead to power inhomogeneity in current systems and manifest into performance inhomogeneity in power constrained environments, drastically limiting supercomputing performance. We present a first-of-its-kind study on manufacturing variability on four production HPC systems spanning four microarchitectures, analyze its impact on HPC applications, and propose a novel variation-aware power budgeting scheme to maximize effective application performance. Our low-cost and scalable budgeting algorithm strives to achieve performance homogeneity under a power constraint by deriving application-specific, module-level power allocations. Experimental results using a 1,920 socket system show up to 5.4X speedup, with an average speedup of 1.8X across all benchmarks when compared to a variation-unaware power allocation scheme.
AB - A key challenge in next-generation supercomputing is to effectively schedule limited power resources. Modern processors suffer from increasingly large power variations due to the chip manufacturing process. These variations lead to power inhomogeneity in current systems and manifest into performance inhomogeneity in power constrained environments, drastically limiting supercomputing performance. We present a first-of-its-kind study on manufacturing variability on four production HPC systems spanning four microarchitectures, analyze its impact on HPC applications, and propose a novel variation-aware power budgeting scheme to maximize effective application performance. Our low-cost and scalable budgeting algorithm strives to achieve performance homogeneity under a power constraint by deriving application-specific, module-level power allocations. Experimental results using a 1,920 socket system show up to 5.4X speedup, with an average speedup of 1.8X across all benchmarks when compared to a variation-unaware power allocation scheme.
KW - performance modeling
KW - power-constrained HPC
UR - http://www.scopus.com/inward/record.url?scp=84966687194&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84966687194&partnerID=8YFLogxK
U2 - 10.1145/2807591.2807638
DO - 10.1145/2807591.2807638
M3 - Conference contribution
AN - SCOPUS:84966687194
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2015
PB - IEEE Computer Society
T2 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
Y2 - 15 November 2015 through 20 November 2015
ER -