Performance analysis of idle programs
Abstract
This paper presents an approach for performance analysis of modern enterprise-class server applications. In our experience, performance bottlenecks in these applications differ qualitatively from bottlenecks in smaller, stand-alone systems. Small applications and benchmarks often suffer from CPU-intensive hot spots. In contrast, enterprise-class multi-tier applications often suffer from problems that manifest not as hot spots, but as idle time indicating a lack of forward motion. Many factors can contribute to undesirable idle time, including locking problems, excessive system-level activities like garbage collection, various resource constraints, and problems driving load. We present the design and methodology for WAIT, a tool to diagnosis the root cause of idle time in server applications. Given lightweight samples of Java activity on a single tier, the tool can often pinpoint the primary bottleneck on a multi-tier system. The methodology centers on an informative abstraction of the states of idleness observed in a running program. This abstraction allows the tool to distinguish, for example, between hold-ups on a database machine, insufficient load, lock contention in application code, and a conventional bottleneck due to a hot method. To compute the abstraction, we present a simple expert system based on an extensible set of declarative rules. WAIT can be deployed on the fly, without modifying or even restarting the application. Many groups in IBM have applied the tool to diagnosis performance problems in commercial systems, and we present a number of examples as case studies. © 2010 ACM.