Evaluating the Potential of Coscheduling on High-Performance Computing Systems

Jason Hall, Arjun Lathi, David K. Lowenthal, Tapasya Patki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern high-performance computing (HPC) system designs have converged to heavyweight nodes with growing numbers of processors. If schedulers on these systems allocate nodes in an exclusive and dedicated manner, many HPC applications and scientific workflows will be unable to fully utilize and benefit from such hardware. This is because at such extreme scale, it will be difficult for modern HPC applications to utilize all of the node-level resources on these systems. In this paper, we investigate the potential of moving away from dedicated node allocation and instead using intelligent coscheduling—where multiple jobs can share node-level resources—to improve node utilization and therefore job turnaround time. We design and implement a coscheduling simulator, and, using traces from a high-end HPC cluster with 100K jobs and 1158 nodes, demonstrate that coscheduling can improve average turnaround times by up to 18% when compared to easy backfilling. Our results indicate that coscheduling has the potential to be a more efficient way to schedule jobs on high-end machines in both turnaround time and system and component utilization.

Original languageEnglish (US)
Title of host publicationJob Scheduling Strategies for Parallel Processing - 26th Workshop, JSSPP 2023, Revised Selected Papers
EditorsDalibor Klusáček, Julita Corbalán, Gonzalo P. Rodrigo
PublisherSpringer Science and Business Media Deutschland GmbH
Pages155-172
Number of pages18
ISBN (Print)9783031439421
DOIs
StatePublished - 2023
Event26th workshop on Job Scheduling Strategies for Parallel Processing, JSSPP 2023 - St. Petersburg, United States
Duration: May 19 2023May 19 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14283 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th workshop on Job Scheduling Strategies for Parallel Processing, JSSPP 2023
Country/TerritoryUnited States
CitySt. Petersburg
Period5/19/235/19/23

Keywords

  • coscheduling
  • high-performance computing

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Evaluating the Potential of Coscheduling on High-Performance Computing Systems'. Together they form a unique fingerprint.

Cite this