A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC

Monodeep Kar; Joel Silberman; Swagath Venkataramani; Viji Srinivasan; Bruce Fleischer; Joshua Rubin; Johndavid Lancaster; Saekyu Lee; Matthew Cohen; Matthew Ziegler; Nianzheng Cao; Sandra Woodward; Ankur Agrawal; Ching Zhou; Prasanth Chatarasi; Thomas Gooding; Michael Guillorn; Bahman Hekmatshoartabari; Philip Jacob; Radhika Jain; Shubham Jain; Jinwook Jung; Kyu-Hyoun Kim; Siyu Koswatta; Martin Lutz; Alberto Mannari; Abey Mathew; Indira Nair; Ashish Ranjan; Zhibin Ren; Scot Rider; Thomas Roewer; David Satterfield; Marcel Schaal; Sanchari Sen; Gustavo Tellez; Hung Tran; Wei Wang; Vidhi Zalani; Jintao Zhang; Xin Zhang; Vinay Shah; Robert Senger; Arvind Kumar; Pong-Fei Lu; Leland Chang

doi:10.1109/ISSCC49657.2024.10454301

Publication

ISSCC 2024

Conference paper

A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC

ISSCC 2024

View publication

Abstract

The rapid emergence of AI models, specifically large language models (LLMs) requiring large amounts of compute, drives the need for dedicated AI inference hardware. During deployment, compute utilization (and thus power consumption) can vary significantly across layers of an AI model, number of tokens, precision, and batch size [1]. Such wide variation, which may occur at fast time scales, poses unique challenges in optimizing performance within the system-level specifications for discrete accelerator cards, including not just average power consumption, but also peak instantaneous current draw, which may require consideration of time constants down to μs-scale [2]. Prior current-limiting systems [2], [3], which use reactive schemes and often target general-purpose processors, may not be sufficient for AI workloads. This work leverages the unique characteristic of AI workloads, which allows predictive compile-time software optimization and proposes a new power management architecture to minimize worst-case margins and realize the potential of AI accelerators. In addition, due to wide variation of power consumption across card components in AI workloads, sensing the card-level (vs. chip-level) current provides more opportunity for optimization. A new software-assisted feed-forward current-limiting scheme is thus proposed in conjunction with PCIe-card-level closed-loop control to maximize performance under sub-ms peak current constraints.

A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC

Abstract

Date

Publication

Authors

Topics

Abstract

Date

Publication

Authors

Topics

Share