Publication
ISSCC 2024
Invited talk

IBM NorthPole: An Architecture for Neural Network Inference with a 12nm Chip

Abstract

The NorthPole Architecture achieves high performance with high efficiency by using local memory within a parallel, distributed core array, linked by networks-on-chip to ensure data availability, orchestrated by prescheduled, distributed local control. A 12nm NorthPole Inference Chip (22B transistors, 795mm2) includes a 256-Core Array with 192MB of distributed SRAM. At nominal 400MHz frequency, it computes TOPS exceeding 200 at 8b-, 400 at 4b-, and 800 at 2b-precision with very high utilization.