TY - GEN
T1 - FPGA-based Minimal Latency HEFT Scheduler for Heterogeneous Computing
AU - Aliyev, Ilkin
AU - Mack, Joshua
AU - Kumbhare, Nirmal
AU - Akoglu, Ali
AU - Fatih Ugurdag, H.
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - This paper proposes a new hardware scheduler. As heterogeneous computing becomes prevalent, mapping applications on to multiple processing elements (PEs) proves to be nontrivial. Heterogeneous Earliest Finish Time (HEFT) algorithm is an already existing scheduler that aims to minimize the total execution time of an application. The paradigm of HEFT is such that it accepts an acyclic task graph as input at run-time and assigns/schedules the precompiled atomic tasks to PEs. HEFT stands out among many such schedulers not only in terms of producing shorter schedules but also in terms of its own short execution time. However, in real-time applications, the lower the latency, the better it is. To the best of our knowledge, this work is the only work that implements HEFT in hardware (on FPGA) further lowering its latency from milliseconds to as much as less than a microsecond. Porting HEFT to hardware has been challenging as data dependencies limit the amount of parallelism. Design of an efficient memory access pattern as well as an “incremental sorter” were key enablers in reducing the latency of the hardware implementation. We also integrated our FPGA-HEFT into an ARM-based SoC and validated its functionality using a realistic workload.
AB - This paper proposes a new hardware scheduler. As heterogeneous computing becomes prevalent, mapping applications on to multiple processing elements (PEs) proves to be nontrivial. Heterogeneous Earliest Finish Time (HEFT) algorithm is an already existing scheduler that aims to minimize the total execution time of an application. The paradigm of HEFT is such that it accepts an acyclic task graph as input at run-time and assigns/schedules the precompiled atomic tasks to PEs. HEFT stands out among many such schedulers not only in terms of producing shorter schedules but also in terms of its own short execution time. However, in real-time applications, the lower the latency, the better it is. To the best of our knowledge, this work is the only work that implements HEFT in hardware (on FPGA) further lowering its latency from milliseconds to as much as less than a microsecond. Porting HEFT to hardware has been challenging as data dependencies limit the amount of parallelism. Design of an efficient memory access pattern as well as an “incremental sorter” were key enablers in reducing the latency of the hardware implementation. We also integrated our FPGA-HEFT into an ARM-based SoC and validated its functionality using a realistic workload.
KW - DSSoC
KW - Hardware scheduler
KW - Heterogeneous computing
KW - Task scheduling
UR - http://www.scopus.com/inward/record.url?scp=85125855766&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125855766&partnerID=8YFLogxK
U2 - 10.1109/UBMK52708.2021.9558926
DO - 10.1109/UBMK52708.2021.9558926
M3 - Conference contribution
AN - SCOPUS:85125855766
T3 - Proceedings - 6th International Conference on Computer Science and Engineering, UBMK 2021
SP - 244
EP - 248
BT - Proceedings - 6th International Conference on Computer Science and Engineering, UBMK 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th International Conference on Computer Science and Engineering, UBMK 2021
Y2 - 15 September 2021 through 17 September 2021
ER -