The Bulk Multicore architecture for improved programmability
Abstract
A novel and general-purpose multicore architecture, called the Bulk Multicore was designed to enable a highly programmable environment. The programmer and runtime system were relieved of having to manage the sharing of data due to novel support for scalable hardware cache coherence. The Bulk Multicore provided to the software high-performance sequential memory consistency and introduced several novel hardware primitives to help minimize the chance of parallel-programming errors. These primitives were to be used to build an advanced program-development-and-debugging environment. These include low-overhead datarace detection, deterministic replay of parallel programs, and high-speed disambiguation of sets of addresses. The key idea in the Bulk Multicore involved two processes where the hardware automatically executed all software as a series of atomic blocks of a large number of dynamic instructions called Chunks.