In-memory computing accelerator architectures for throughput-critical and resource-constrained systems
Abstract
In-memory computing is a promising non-Von Neumann approach to computing where computational tasks are performed in memory by exploiting the physical attributes of memory devices. In-memory computing tiles are very suitable for multiply-and-accumulate (MAC) operations, and this makes the technology very attractive for deep neural network acceleration. This talk will focus on in-memory computing accelerator architectures for end-to-end deep neural network inference for throughput-optimized and resource-constrained systems. First, a throughput-critical anomaly detection use case in particle physics will be introduced and an architecture with pipelined layer execution will be presented. Secondly, an architecture design for always-on TinyML perception tasks will be shown. To meet the stringent area and power requirements of this resource-constrained system, a layer-serial execution methodology is adopted.