Abstract
Structured-grid PDE solver frameworks parallelize over boxes, which are rectangular domains of cells or faces in a structured grid. In the Chombo framework, the box sizes are typically 163 or 323, but larger box sizes such as 1283 would result in less surface area and therefore less storage, copying, and/or ghost cells communication overhead. Unfortunately, current on node parallelization schemes perform poorly for these larger box sizes. In this paper, we investigate 30 different inter-loop optimization strategies and demonstrate the parallel scaling advantages of some of these variants on NUMA multicore nodes. Shifted, fused, and communication-avoiding variants for 1283 boxes result in close to ideal parallel scaling and come close to matching the performance of 163 boxes on three different multicore systems for a benchmark that is a proxy for program idioms found in Computational Fluid Dynamic (CFD) codes.
Original language | English (US) |
---|---|
Article number | 7013052 |
Pages (from-to) | 793-804 |
Number of pages | 12 |
Journal | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
Volume | 2015-January |
Issue number | January |
DOIs | |
State | Published - Jan 16 2014 |
Externally published | Yes |
Event | International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014 - New Orleans, United States Duration: Nov 16 2014 → Nov 21 2014 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Computer Science Applications
- Hardware and Architecture
- Software