TY - GEN
T1 - There goes the neighborhood
T2 - 2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013
AU - Bhatele, Abhinav
AU - Mohror, Kathryn
AU - Langer, Steven H.
AU - Isaacs, Katherine E.
PY - 2013
Y1 - 2013
N2 - Predictable performance is important for understanding and alleviating application performance issues; quantifying the effects of source code, compiler, or system software changes; estimating the time required for batch jobs; and determining the allocation requests for proposals. Our experiments show that on a Cray XE system, the execution time of a communication-heavy parallel application ranges from 28% faster to 41% slower than the average observed performance. Blue Gene systems, on the other hand, demonstrate no noticeable run-to-run variability. In this paper, we focus on Cray machines and investigate potential causes for performance variability such as OS jitter, shape of the allocated partition, and interference from other jobs sharing the same network links. Reducing such variability could improve overall throughput at a computer center and save energy costs.
AB - Predictable performance is important for understanding and alleviating application performance issues; quantifying the effects of source code, compiler, or system software changes; estimating the time required for batch jobs; and determining the allocation requests for proposals. Our experiments show that on a Cray XE system, the execution time of a communication-heavy parallel application ranges from 28% faster to 41% slower than the average observed performance. Blue Gene systems, on the other hand, demonstrate no noticeable run-to-run variability. In this paper, we focus on Cray machines and investigate potential causes for performance variability such as OS jitter, shape of the allocated partition, and interference from other jobs sharing the same network links. Reducing such variability could improve overall throughput at a computer center and save energy costs.
KW - Communication performance
KW - Interference
KW - Resource management
KW - System noise
KW - Torus networks
UR - http://www.scopus.com/inward/record.url?scp=84899698707&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84899698707&partnerID=8YFLogxK
U2 - 10.1145/2503210.2503247
DO - 10.1145/2503210.2503247
M3 - Conference contribution
AN - SCOPUS:84899698707
SN - 9781450323789
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2013
PB - IEEE Computer Society
Y2 - 17 November 2013 through 22 November 2013
ER -