Incremental Checkpointing for Fault-Tolerant Stream Processing Systems: A Data Structure Approach
Abstract
As the demand of high-speed stream processing grows, in-memory databases are widely used to analyze streaming data. It is challenging for in-memory systems to meet the requirements of high throughput and data persistence at the same time since data are not stored in disks. ARIES logging and command logging are two popular logging methods. In current applications, both ARIES logging and command logging are necessary. However, no checkpointing mechanism includes both the functions of ARIES logging method and command logging method. Besides, adopting ARIES logging method in an in-memory database creates high overhead. Command logging records redundant commands and has high storage cost. To address the above issues, we utilize order-irrelevant characteristics of data structure and incremental checkpointing concepts to devise a data structure based incremental checkpointing (DSIC) mechanism. DSIC mechanism is a very low overhead checkpointing approach while retaining the features of ARIES logging and command logging. DSIC mechanism reduces more than 70 percent logging time of the existing logging scheme and saves 40 percent storage costs of the existing logging scheme.