TY - GEN
T1 - Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on Amazon EC2
AU - Marathe, Aniruddha
AU - Harris, Rachel
AU - Lowenthal, David K.
AU - De Supinski, Bronis R.
AU - Rountree, Barry
AU - Schulz, Martin
PY - 2014
Y1 - 2014
N2 - The use of clouds to execute high-performance computing (HPC) applications has greatly increased recently. Clouds provide several potential advantages over traditional supercomputers and in-house clusters. The most popular cloud is currently Amazon EC2, which provides a fixed-cost option (called on-demand) and a variable-cost, auction-based option (called the spot market). The spot market trades lower cost for potential interruptions that necessitate checkpointing; if the market price exceeds the bid price, a node is taken away from the user without warning. We explore techniques to maximize performance per dollar given a time constraint within which an application must complete. Specifically, we design and implement multiple techniques to reduce expected cost by exploiting redundancy in the EC2 spot market. We then design an adaptive algorithm that selects a scheduling algorithm and determines the bid price. We show that our adaptive algorithm executes programs up to 7x cheaper than using the on-demand market and up to 44% cheaper than the best non-redundant, spot-market algorithm.
AB - The use of clouds to execute high-performance computing (HPC) applications has greatly increased recently. Clouds provide several potential advantages over traditional supercomputers and in-house clusters. The most popular cloud is currently Amazon EC2, which provides a fixed-cost option (called on-demand) and a variable-cost, auction-based option (called the spot market). The spot market trades lower cost for potential interruptions that necessitate checkpointing; if the market price exceeds the bid price, a node is taken away from the user without warning. We explore techniques to maximize performance per dollar given a time constraint within which an application must complete. Specifically, we design and implement multiple techniques to reduce expected cost by exploiting redundancy in the EC2 spot market. We then design an adaptive algorithm that selects a scheduling algorithm and determines the bid price. We show that our adaptive algorithm executes programs up to 7x cheaper than using the on-demand market and up to 44% cheaper than the best non-redundant, spot-market algorithm.
KW - Cloud
KW - Cost
KW - Fault-tolerance
KW - Resource provisioning
UR - http://www.scopus.com/inward/record.url?scp=84904438124&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904438124&partnerID=8YFLogxK
U2 - 10.1145/2600212.2600226
DO - 10.1145/2600212.2600226
M3 - Conference contribution
AN - SCOPUS:84904438124
SN - 9781450327480
T3 - HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing
SP - 279
EP - 290
BT - HPDC 2014 - Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing
PB - Association for Computing Machinery
T2 - 23rd ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC 2014
Y2 - 23 June 2014 through 27 June 2014
ER -