High-throughput, lossless data compression on FPGAs
Abstract
Loss less compression is often used before writing data to a storage medium or transmitting across a transmission medium. Compression aids by saving storage space or transmission bandwidth, a decompression operation is performed when the data is subsequently read. Though this scheme has clear benefits, the execution time of compression and decompression is critical to its application in real-time systems. Software compression utilities are often slow, leading to degraded system performance. Hardware-based solutions, on the other hand, often drive large resource requirements and are not amenable to supporting future algorithmic changes. In the current article, we present a high-throughput, streaming, loss less compression algorithm and its efficient implementation on FPGAs. The proposed solution provides a peak throughput of 1GB/sec per engine, with a sustained overall measured throughput of 2.66GB/sec on a PCIe-based FPGA board with two compression and two decompression engines. This result represents an overall speedup of 13.6x over reference software implementation. The proposed design is very lean, and, with multiple engines running in parallel, provides a path to potential speedups of up to two orders of magnitude. In the current implementation, the achievable overall throughput is limited only by the available PCIe bus bandwidth. © 2011 IEEE.