TY - GEN
T1 - Practical resource management in power-constrained, high performance computing
AU - Patki, Tapasya
AU - Lowenthal, David K.
AU - Sasidharan, Anjana
AU - Maiterth, Matthias
AU - Rountree, Barry L.
AU - Schulz, Martin
AU - De Supinski, Bronis R.
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/6/15
Y1 - 2015/6/15
N2 - Power management is one of the key research challenges on the path to exascale. Supercomputers today are designed to be worst-case power provisioned, leading to two main problems| limited application performance and under-utilization of procured power. In this paper, we propose RMAP, a practical, low-overhead resource manager targeted at future power-constrained clusters. The goals for RMAP are to improve application performance as well as system power utilization, and thus minimize the average turnaround time for all jobs. Within RMAP, we design and analyze an adaptive policy, which derives job-level power bounds in a fair-share manner and supports overprovisioning and power-aware backfilling. Our results show that our new policy increases system power utilization while adhering to strict job-level power bounds and leads to 31% (19% on average) and 54% (36% on average) faster average turnaround time when compared to worstcase provisioning and naive overprovisioning respectively.
AB - Power management is one of the key research challenges on the path to exascale. Supercomputers today are designed to be worst-case power provisioned, leading to two main problems| limited application performance and under-utilization of procured power. In this paper, we propose RMAP, a practical, low-overhead resource manager targeted at future power-constrained clusters. The goals for RMAP are to improve application performance as well as system power utilization, and thus minimize the average turnaround time for all jobs. Within RMAP, we design and analyze an adaptive policy, which derives job-level power bounds in a fair-share manner and supports overprovisioning and power-aware backfilling. Our results show that our new policy increases system power utilization while adhering to strict job-level power bounds and leads to 31% (19% on average) and 54% (36% on average) faster average turnaround time when compared to worstcase provisioning and naive overprovisioning respectively.
KW - Power-constrained HPC
KW - Resource Management
UR - http://www.scopus.com/inward/record.url?scp=84987740923&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84987740923&partnerID=8YFLogxK
U2 - 10.1145/2749246.2749262
DO - 10.1145/2749246.2749262
M3 - Conference contribution
AN - SCOPUS:84987740923
T3 - HPDC 2015 - Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing
SP - 121
EP - 132
BT - HPDC 2015 - Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing
PB - Association for Computing Machinery, Inc
T2 - 24th ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC 2015
Y2 - 15 June 2015 through 19 June 2015
ER -