Greener Data Exchange in the Cloud: A Coding-Based Optimization for Big Data Processing
Abstract
The rise of the cloud and distributed data-intensive (big data) applications puts pressure on data center networks due to the movement of massive volumes of data. Reducing the volume of communication is pivotal for embracing greener data exchange by efficient utilization of network resources. This paper proposes the use of mixing technique, spate coding, working in tandem with software-defined network control as a means of dynamically-controlled reduction in volume of communication. We introduce motivating real-world use-cases, and present a novel spate coding algorithm for the data center networks. We also analyze the computational complexity of the general problem of minimizing the volume of communication in a distributed data center application without degrading the rate of information exchange, and provide theoretical limits of such schemes. Moreover, we proceed to bridge the gap between theory and practice by performing a proof-of-concept implementation of the proposed system in a real world data center. We use Hadoop MapReduce, the most widely used big data processing framework, as our target. The experimental results employing two of industry standard benchmarks show the advantage of our proposed system compared to a vanilla Hadoop implementation, an in-network combiner, and Combine-N-Code. The proposed coding-based scheme shows performance improvement in terms of volume of communication (up to 62%), goodput (up to 76%), disk utilization (up to 38%), and the number of bits that can be transmitted per Joule of energy (up to 200%).