Abstract
GF11 is a parallel computer operational at IBM's T.J. Watson Research Center. It is based on the SIMD (Single Instruction Multiple Data) model of parallel computing. GF11 attains its peak execution rate of 11.3 GigaFlops by using 566 identical processing elements, each capable of delivering 20 MegaFlops. Each processor has its own 64 Kb static RAM that can access a 32-bit word on each floating point operation, a 2 Mb dynamic RAM that operates at one fourth of the SRAM speed, and a 1 Kb register file that provides four accesses per floating point operation. The processors communicate through a 576×576 Benes network, organized as three stages of 24×24 crossbar switches. The network provides 11.3 Gb/sec of communication bandwidth to the processors and allows the processors to dynamically reconfigure themselves into arrays f various dimensions and sizes or other interesting interconnection patterns such as a tree, hypercube, etc. This configuration can take place on every word transfer without sacrificing the bandwidth. GF11 has several architectural enhancements to circumvent the limitations of the standard SIMD model such as the ability to perform multiple operations in every instruction and the ability to modify the operations occurring within individual processors based on processor specific data. Preliminary benchmarking efforts on some applications indicate that near peak performance can be sustained on most applications, including some that were previously believed to be ill suited SIMD machines. Minimal restructuring of programs and algorithms is required for achieving this performance. The architecture of GF11 is summarized in this paper and the implementations of Finite Element analysis, LU decomposition, Gaussian Elimination, and Fast Fourier Transform are discussed to illustrate GF11's ability to deliver good performance with minor program restructing. © 1993.