Distributed backup scheduling: Modeling and optimization
Abstract
Recent years have seen rapid growth in data storage, magnifying the importance of ensuring data safety by performing regular backups. However, traffic created by such backups can be a significant burden on the underlying communication network. In the present paper we address the tradeoff between frequent backups (increased safety) and reducing the network peak load. We address the problem of shifting backup traffic from peak hours to off-peak hours within the constraints imposed by user connectivity. Backups are scheduled using a distributed protocol characterized by a set of probabilities that indicate the likelihood of a user initiating a backup during a given hour. Given these probabilities, we study the network capacity by investigating the rate at which users can generate data while retaining stable backlog processes. We then derive explicit expressions for the stationary behavior of the backup process, and discuss how to choose the backup probabilities that strike the right balance between a low peak load and data safety. Via simulation experiments we show that this approach is highly successful in reducing costs. © 2014 IEEE.