TY - JOUR
T1 - Using fine-grain threads and run-time decision making in parallel computing
AU - Lowenthal, David K.
AU - Freeh, Vincent W.
AU - Andrews, Gregory R.
N1 - Funding Information:
* This research was supported by NSF Grants CCR-9415303 and CDA-8822652. † E-mail: {dkl,vin,greg}@cs.arizona.edu.
PY - 1996/8/25
Y1 - 1996/8/25
N2 - Programming distributed-memory multiprocessors and networks of workstations requires deciding what can execute concurrently, how processes communicate, and where data is placed. These decisions can be made statically by a programmer or compiler, or they can be made dynamically at run time. Using run-time decisions leads to a simpler interface - because decisions are implicit - and it can lead to better decisions -because more information is available. This paper examines the costs, benefits, and details of making decisions at run time. The starting point is explicit fine-grain parallelism with any number (even thousands) of threads. Five specific techniques are considered: (1) implicitly coarsening the granularity of parallelism, (2) using implicit communication implemented by a distributed shared memory, (3) overlapping computation and communication, (4) adaptively moving threads and data between nodes to minimize communication and balance load, and (5) dynamically remapping data to pages to avoid false sharing. Details are given on the performance of each of these techniques as well as on their overall performance for several scientific applications.
AB - Programming distributed-memory multiprocessors and networks of workstations requires deciding what can execute concurrently, how processes communicate, and where data is placed. These decisions can be made statically by a programmer or compiler, or they can be made dynamically at run time. Using run-time decisions leads to a simpler interface - because decisions are implicit - and it can lead to better decisions -because more information is available. This paper examines the costs, benefits, and details of making decisions at run time. The starting point is explicit fine-grain parallelism with any number (even thousands) of threads. Five specific techniques are considered: (1) implicitly coarsening the granularity of parallelism, (2) using implicit communication implemented by a distributed shared memory, (3) overlapping computation and communication, (4) adaptively moving threads and data between nodes to minimize communication and balance load, and (5) dynamically remapping data to pages to avoid false sharing. Details are given on the performance of each of these techniques as well as on their overall performance for several scientific applications.
UR - http://www.scopus.com/inward/record.url?scp=0030601294&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0030601294&partnerID=8YFLogxK
U2 - 10.1006/jpdc.1996.0106
DO - 10.1006/jpdc.1996.0106
M3 - Article
AN - SCOPUS:0030601294
SN - 0743-7315
VL - 37
SP - 41
EP - 54
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
IS - 1
ER -