On performance and space usage improvements for parallelized compiled APL code
Abstract
Loop combination has been a traditional optimization technique employed in APL compilers, but may introduce dependencies into the combined loop. We propose an analysis method by which the compiler can keep track of the change of the parallelism when combining high-level primitives. The analysis is necessary when the compiler needs to decide a trade-off between more parallelism and a further combination. We also show how the space usage, as well as the performance, improves by using system calls with the aid of garbage collection to implement a dynamic memory allocation. A modification of the memory management scheme can also increase available parallelism. Our experimental results indicate that the performance and the space usage improve appreciably with the above enhancements.