Accuracy and speed-up of parallel trace-driven architectural simulation
Abstract
Trace-driven simulation continues to be one of the main evaluation methods in the design of high performance processor-memory sub-systems. In this paper, we examine the varying speed-up opportunities available by processing a given trace in parallel on an IBM SP-2 machine. We also develop a simple, yet effective method of correcting for cold-start cache miss errors, by the use of overlapped trace chunks. We then report selected experimental results to validate our expectations. We Show that it is possible to achieve near-perfect speed-up without loss of accuracy. Next, in order to achieve further reduction in simulation cost, we combine uniform sampling methods with parallel trace processing with a slight loss of accuracy for finite-cache timer runs. We then show that by using warm-start sequences from preceding trace chunks, it is possible to reduce the errors back to acceptable bounds.