This paper proposes a new hardware scheduler. As heterogeneous computing becomes prevalent, mapping applications on to multiple processing elements (PEs) proves to be nontrivial. Heterogeneous Earliest Finish Time (HEFT) algorithm is an already existing scheduler that aims to minimize the total execution time of an application. The paradigm of HEFT is such that it accepts an acyclic task graph as input at run-time and assigns/schedules the precompiled atomic tasks to PEs. HEFT stands out among many such schedulers not only in terms of producing shorter schedules but also in terms of its own short execution time. However, in real-time applications, the lower the latency, the better it is. To the best of our knowledge, this work is the only work that implements HEFT in hardware (on FPGA) further lowering its latency from milliseconds to as much as less than a microsecond. Porting HEFT to hardware has been challenging as data dependencies limit the amount of parallelism. Design of an efficient memory access pattern as well as an “incremental sorter” were key enablers in reducing the latency of the hardware implementation. We also integrated our FPGA-HEFT into an ARM-based SoC and validated its functionality using a realistic workload.