Cache-efficient wavelet lifting in JPEG 2000
Abstract
The discrete wavelet transform (DWT), the technology at the heart of the JPEG 2000 image compression system, operates on user-definable tiles of the image, as opposed to fixed-size blocks of the image as does the discrete cosine transform (DCT) used in JPEG. This difference reduces artificial blocking effects but can severely stress the memory system. We examine the interaction of the DWT and the memory hierarchy, modify the structure of the DWT computation and the layout of the image data to improve cache and translation lookaside buffer (TLB) locality, and demonstrate significant performance improvements of the DWT over a baseline implementation. Our optimized DWT implementation exhibits speedups of up to 4 × over the DWT in a JPEG 2000 reference implementation.