TY - GEN
T1 - Cache injection for parallel applications
AU - León, Edgar A.
AU - Riesen, Rolf
AU - Ferreira, Kurt B.
AU - MacCabe, Arthur B.
PY - 2011
Y1 - 2011
N2 - For two decades, the memory wall has affected many applications in their ability to benefit from improvements in processor speed. Cache injection addresses this disparity for I/O by writing data into a processor's cache directly from the I/O bus. This technique reduces data latency and, unlike data prefetching, improves memory bandwidth utilization. These improvements are significant for data-intensive applications whose performance is dominated by compulsory cache misses. We present an empirical evaluation of three injection policies and their effect on the performance of two parallel applications and several collective micro-benchmarks. We demonstrate that the effectiveness of cache injection on performance is a function of the communication characteristics of applications, the injection policy, the target cache, and the severity of the memory wall. For example, we show that injecting message payloads to the L3 cache can improve the performance of network-bandwidth limited applications. In addition, we show that cache injection improves the performance of several collective operations, but not all-to-all operations (implementation dependent). Our study shows negligible pollution to the target caches.
AB - For two decades, the memory wall has affected many applications in their ability to benefit from improvements in processor speed. Cache injection addresses this disparity for I/O by writing data into a processor's cache directly from the I/O bus. This technique reduces data latency and, unlike data prefetching, improves memory bandwidth utilization. These improvements are significant for data-intensive applications whose performance is dominated by compulsory cache misses. We present an empirical evaluation of three injection policies and their effect on the performance of two parallel applications and several collective micro-benchmarks. We demonstrate that the effectiveness of cache injection on performance is a function of the communication characteristics of applications, the injection policy, the target cache, and the severity of the memory wall. For example, we show that injecting message payloads to the L3 cache can improve the performance of network-bandwidth limited applications. In addition, we show that cache injection improves the performance of several collective operations, but not all-to-all operations (implementation dependent). Our study shows negligible pollution to the target caches.
KW - cache injection
KW - memory wall
UR - https://www.scopus.com/pages/publications/79960507433
UR - https://www.scopus.com/pages/publications/79960507433#tab=citedBy
U2 - 10.1145/1996130.1996135
DO - 10.1145/1996130.1996135
M3 - Conference contribution
AN - SCOPUS:79960507433
SN - 9781450305525
T3 - Proceedings of the IEEE International Symposium on High Performance Distributed Computing
SP - 15
EP - 26
BT - HPDC'11 - Proceedings of the 20th International Symposium on High Performance Distributed Computing
T2 - 20th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC'11
Y2 - 8 June 2011 through 11 June 2011
ER -