Jigsaw: A High-Utilization, Interference-Free Job Scheduler for Fat-Tree Clusters

Staci A. Smith, David K. Lowenthal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Jobs on HPC clusters can suffer significant performance degradation due to inter-job network interference. Approaches to mitigating this interference primarily focus on reactive routing schemes. A better approach - -in that it completely eliminates inter-job interference - -is to implement scheduling policies that proactively enforce network isolation for every job. However, existing schedulers that allocate isolated partitions lead to lowered system utilization, which creates a barrier to adoption. Accordingly, we design and implement Jigsaw, a new job-isolating scheduling approach for three-level fat-trees that overcomes this barrier. Jigsaw typically achieves system utilization of 95-96%, while guaranteeing dedicated network links to jobs. In scenarios where jobs experience even modest performance improvements from interference-freedom, Jigsaw typically leads to lower job turnaround times and higher throughput than traditional job scheduling. To the best of our knowledge, Jigsaw is the first scheduler to eliminate inter-job network interference while maintaining high system utilization, leading to improved job and system performance.

Original languageEnglish (US)
Title of host publicationHPDC 2021 - Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing
PublisherAssociation for Computing Machinery, Inc
Pages201-213
Number of pages13
ISBN (Electronic)9781450382175
DOIs
StatePublished - Jun 21 2021
Event30th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2021 - Virtual, Online, Sweden
Duration: Jun 21 2021Jun 25 2021

Publication series

NameHPDC 2021 - Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing

Conference

Conference30th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2021
Country/TerritorySweden
CityVirtual, Online
Period6/21/216/25/21

Keywords

  • fat-tree
  • inter-job network interference
  • scheduling
  • utilization

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Jigsaw: A High-Utilization, Interference-Free Job Scheduler for Fat-Tree Clusters'. Together they form a unique fingerprint.

Cite this