Performance, energy, and thermal considerations for SMT and CMP architectures
Abstract
Simultaneous multithreading (SMT) and chip multiprocessing (CMP) both allow a chip to achieve greater throughput, but their relative energy-efficiency and thermal properties are still poorly understood. This paper uses Turandot, PowerTimer, and HotSpot to explore this design space for a POWER4/POWER5-like core. For an equal-area comparison with this style of core, we find CMP to be superior in terms of performance and energy-efficiency for CPU-bound benchmarks, but SMT to be superior for memory-bound benchmarks due to a larger L2 cache. Although both exhibit similar peak operating temperatures and thermal management overheads, the mechanism by which SMT and CMP heat up are quite different. More specifically, SMT heating is primarily caused by localized heating in certain key structures, CMP heating is mainly caused by the global impact of increased energy output. Because of this difference in heat up machanism, we found that the best thermal management technique is also different for SMT and CMP. Indeed, non-DVS localized thermal-management can outperform DVS for SMT. Finally, we show that CMP and SMT will scale differently as the contribution of leakage power grows, with CMP suffering from higher leakage due to the second core's higher temperature and the exponential temperature-dependence of subthreshold leakage. © 2005 IEEE.