Efficient computation of communicator variables for programs with unstructured parallelism
Abstract
We present an algorithm to determine communicator variables in parallel programs. If communicator variables are accessed in program order and accesses to other shared variables are not reordered with respect to communicators, then program executions are sequentially consistent. Computing communicators is an efficient and effective alternative to delay set computation. The algorithm does not require a thread and whole-program control-flow model and tolerates the typical approximations that static program analyses make for threads and data. These properties make the algorithm suitable to handle multi-threaded object-oriented programs with unstructured parallelism. We demonstrate on several multi-threaded Java programs that the algorithm is effective in reducing the number of fences at memory access statements compared to a naive fence insertion algorithm (the reduction is on average 28%) and report the runtime overhead caused by the fences (between 0% and 231%, average 81%). © Springer-Verlag Berlin Heidelberg 2005.