TY - GEN
T1 - Lightweight, high-resolution monitoring for troubleshooting production systems
AU - Bhatia, Sapan
AU - Kumar, Abhishek
AU - Fiuczynski, Marc E.
AU - Peterson, Larry
N1 - Funding Information:
We gratefully acknowledge feedback from the reviewers, our shepherd, Anthony Joseph, Andy Bavier, Charles Consel, Julia Lawall, Murtaza Motiwala, Jennifer Rex-ford and Vytautas Valancius. This work was funded, in part, by NSF grants CNS-0335214 and CNS-0520053.
PY - 2019
Y1 - 2019
N2 - Production systems are commonly plagued by intermittent problems that are difficult to diagnose. This paper describes a new diagnostic tool, called Chopstix, that continuously collects profiles of low-level OS events (e.g., scheduling, L2 cache misses, CPU utilization, I/O operations, page allocation, locking) at the granularity of ex-ecutables, procedures and instructions. Chopstix then reconstructs these events offline for analysis. We have used Chopstix to diagnose several elusive problems in a large-scale production system, thereby reducing these intermittent problems to reproducible bugs that can be debugged using standard techniques. The key to Chopstix is an approximate data collection strategy that incurs very low overhead. An evaluation shows Chopstix requires under 1% of the CPU, under 256KB of RAM, and under 16MB of disk space per day to collect a rich set of system-wide data.
AB - Production systems are commonly plagued by intermittent problems that are difficult to diagnose. This paper describes a new diagnostic tool, called Chopstix, that continuously collects profiles of low-level OS events (e.g., scheduling, L2 cache misses, CPU utilization, I/O operations, page allocation, locking) at the granularity of ex-ecutables, procedures and instructions. Chopstix then reconstructs these events offline for analysis. We have used Chopstix to diagnose several elusive problems in a large-scale production system, thereby reducing these intermittent problems to reproducible bugs that can be debugged using standard techniques. The key to Chopstix is an approximate data collection strategy that incurs very low overhead. An evaluation shows Chopstix requires under 1% of the CPU, under 256KB of RAM, and under 16MB of disk space per day to collect a rich set of system-wide data.
UR - http://www.scopus.com/inward/record.url?scp=85076917215&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076917215&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85076917215
T3 - Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008
SP - 103
EP - 116
BT - Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008
PB - USENIX Association
T2 - 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008
Y2 - 8 December 2008 through 10 December 2008
ER -