Application profiling is an important step in the design and optimization of embedded systems. Accurately identifying and analyzing the execution of frequently executed computational kernels is needed to effectively optimize the system implementation, at both design time and runtime. Most previous profiling approaches are software based, which can incur significant overhead and may be prohibitive or impractical for profiling embedded systems at runtime. In addition, profiling methods typically focus on profiling the execution of specific tasks executing on a single core, but do not consider accurate and holistic profiling across multiple processor cores. Directly utilizing and naively combining isolated profiles from multiple processor cores can lead to significant profile inaccuracy. In this paper, we present a hardware-based dynamic application profiler for non-intrusively and accurately profiling software applications in multicore embedded systems. The profiler provides a detailed execution profile for computational kernels and maintains profile accuracy across multiple processor cores. The hardware-based profiler achieves an average error of less than 0.5% for the percentage execution time of profiled applications.