Optimization of stateful hardware acceleration in hybrid architectures
Abstract
In many computing domains, hardware accelerators can improve throughput and lower power consumption, instead of executing functionally equivalent software on the general-purpose micro-processors cores. While hardware accelerators often are stateless, network processing exemplifies the need for stateful hardware acceleration. The packet oriented streaming nature of current networks enables data processing as soon as the packets arrive rather than when the data of the whole network flow is available. Due to the concurrence of many flows, an accelerator must maintain and switch contexts between many states of the various accelerated streams embodied in the flows, which increases overhead associated with acceleration. This paper proposes to dynamically reorder the requests of different accelerated streams in a hybrid on-chip/memory based request queue to reduce the associated overhead. Through a simulation-based performance study, the effectiveness of the proposed mechanism for different popular stateful accelerators is shown. The experimental results shown the approach can help reduce the average response time significantly and improve throughput up to 26.7% and response time reduction of upto 50% for decompression acceleration compared with the traditional FIFO order design.