TY - GEN
T1 - A Study of Runtime Adaptive Prefetching for STTRAM L1 Caches
AU - Kuan, Kyle
AU - Adegbija, Tosiron
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/10
Y1 - 2020/10
N2 - Spin- Transfer Torque RAM (STTRAM) is a promising alternative to SRAM in on-chip caches due to several advantages. These advantages include non-volatility, low leakage, high integration density, and CMOS compatibility. Prior studies have shown that relaxing and adapting the STTRAM retention time to runtime application needs can substantially reduce overall cache energy without significant latency overheads, due to the lower STTRAM write energy and latency in shorter retention times. In this paper, as a first step towards efficient prefetching across the STTRAM cache hierarchy, we study prefetching in reduced retention STTRAM L1 caches. Using SPEC CPU 2017 benchmarks, we analyze the energy and latency impact of different prefetch distances in different STTRAM cache retention times for different applications. We show that expired-unused-prefetches? the number of unused prefetches expired by the reduced retention time STTRAM cache-can accurately determine the best retention time for energy consumption and access latency. This new metric can also provide insights into the best prefetch distance for memory bandwidth consumption and prefetch accuracy. Based on our analysis and insights, we propose Prefetch-Aware Retention time Tuning (PART) and Retention time-based Prefetch Control (RPC). Compared to a base STTRAM cache, PART and RPC collectively reduced the average cache energy and latency by 22.24 % and 24.59 %, respectively. When the base architecture was augmented with the state-of-the-art near-side prefetch throttling (NST), PART+RPC reduced the average cache energy and latency by 3.50 % and 3.59 %, respectively, and reduced the hardware overhead by 54.55 %.
AB - Spin- Transfer Torque RAM (STTRAM) is a promising alternative to SRAM in on-chip caches due to several advantages. These advantages include non-volatility, low leakage, high integration density, and CMOS compatibility. Prior studies have shown that relaxing and adapting the STTRAM retention time to runtime application needs can substantially reduce overall cache energy without significant latency overheads, due to the lower STTRAM write energy and latency in shorter retention times. In this paper, as a first step towards efficient prefetching across the STTRAM cache hierarchy, we study prefetching in reduced retention STTRAM L1 caches. Using SPEC CPU 2017 benchmarks, we analyze the energy and latency impact of different prefetch distances in different STTRAM cache retention times for different applications. We show that expired-unused-prefetches? the number of unused prefetches expired by the reduced retention time STTRAM cache-can accurately determine the best retention time for energy consumption and access latency. This new metric can also provide insights into the best prefetch distance for memory bandwidth consumption and prefetch accuracy. Based on our analysis and insights, we propose Prefetch-Aware Retention time Tuning (PART) and Retention time-based Prefetch Control (RPC). Compared to a base STTRAM cache, PART and RPC collectively reduced the average cache energy and latency by 22.24 % and 24.59 %, respectively. When the base architecture was augmented with the state-of-the-art near-side prefetch throttling (NST), PART+RPC reduced the average cache energy and latency by 3.50 % and 3.59 %, respectively, and reduced the hardware overhead by 54.55 %.
KW - Spin Transfer Torque RAM; STTRAM; prefetcher; stride prefetcher; L1 cache; prefetching; GEM5; SPEC CPU 2017
UR - http://www.scopus.com/inward/record.url?scp=85098890084&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098890084&partnerID=8YFLogxK
U2 - 10.1109/ICCD50377.2020.00051
DO - 10.1109/ICCD50377.2020.00051
M3 - Conference contribution
AN - SCOPUS:85098890084
T3 - Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors
SP - 247
EP - 254
BT - Proceedings - 2020 IEEE 38th International Conference on Computer Design, ICCD 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 38th IEEE International Conference on Computer Design, ICCD 2020
Y2 - 18 October 2020 through 21 October 2020
ER -