A low-communication, parallel algorithm for solving PDEs based on range decomposition
Abstract
This paper proposes a new, low-communication algorithm for solving PDEs on massively parallel computers. The range decomposition (RD) algorithm exposes coarse-grain parallelism by applying nested iteration and adaptive mesh refinement locally before performing a global communication step. Just a few such steps are observed to be sufficient to obtain accuracy within a small multiple of discretization error. The target applications are petascale and exascale machines, where hierarchical parallelism is required and traditional parallel numerical PDE communication patterns are costly because of message latency. The RD algorithm uses a partition of unity to equally distribute the error, and thus, the work. The computational advantages of this approach are that the decomposed problems can be solved in parallel without any communication until the partitioned solutions are summed. This offers potential advantages in the paradigm of expensive communication but very cheap computation. This paper introduces the method and explains the details of the communication step. Two performance models are developed, showing that the latency cost associated with a traditional parallel implementation of nested iteration is proportional to log(P)2, whereas the RD method reduces the communication latency to log(P), while maintaining similar bandwidth costs. Numerical results for two problems, Laplace and advection diffusion, demonstrate the enhanced performance, and a heuristic argument explains why the method converges quickly. Copyright © 2016 John Wiley & Sons, Ltd.