Data traffic reduction schemes for cholesky factorization on asynchronous multiprocessor systems
Abstract
For multiprocessor systems with two level memory hierarchy, The communication requirements of parallel Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The data, traffic associated with computing the Cholesky factor of an n × n dense matrix using nα processors, α ≤ 2, is shown to be ω(n2,2+α/2), assuming that the computational load is uniformly distributed. For an n × n sparse matrix, representing a √n × √n regular grid graph, the corresponding data traffic is shown to be ω(n1+α/2), α ≤ 1. Partitioning schemes that, are variations of block assignment scheme are described. The data traffic generated by these schemes are asymptotically optimal and these Scheines allow efficient use of up to O(n2) and O(n) processors in the dense and the sparse case, re-speclively. The block based partitioning schemes are shown to provide a better utilization of (he data accessed from the shared memory and reduce the total data traffic as compared to the schemes based on the column-wise wrap around assignment.