Using multiple threads to accelerate single thread performance

Zehra Sura; Kevin Obrien; Jose Brunheroto

doi:10.1109/IPDPS.2014.104

Publication

IPDPS 2014

Conference paper

Using multiple threads to accelerate single thread performance

IPDPS 2014

View publication

Abstract

Computing systems are being designed with an increasing number of hardware cores. To effectively use these cores, applications need to maximize the amount of parallel processing and minimize the time spent in sequential execution. In this work, we aim to exploit fine-grained parallelism beyond the parallelism already encoded in an application. We define an execution model using a primary core and some number of secondary cores that collaborate to speed up the execution of sequential code regions. This execution model relies on cores that are physically close to each other and have fast communication paths between them. For this purpose, we introduce dedicated hardware queues for low-latency transfer of values between cores, and define special 'enque' and 'deque' instructions to use the queues. Further, we develop compiler analyses and transformations to automatically derive fine-grained parallel code from sequential code regions. We implemented this model for exploiting fine-grained parallelization in the IBM XL compiler framework and in a simulator for the Blue Gene/Q system. We also studied the Sequoia benchmarks to determine code sections where our techniques are applicable. We evaluated our work using these code sections, and observed an average speedup of 1.32 on 2 cores, and an average speedup of 2.05 on 4 cores. Since these code sections are otherwise sequentially executed, we conclude that our approach is useful for accelerating single thread performance. © 2014 IEEE.

Date

19 May 2014

Publication

IPDPS 2014

Authors

IBM-affiliated at time of publication

Abstract

Date

Publication

Authors

Share