Abstract
With the rise in the use of data centers comprised of commodity clusters for data-intensive applications, the energy efficiency of these setups is becoming a paramount concern for data center operators. Moreover, applications developed for Hadoop framework, which has now become a de-facto implementation of the MapReduce framework, now comprise complex workflows that are managed by specialized workflow schedulers, such as Oozie. These schedulers assume cluster resources to be homogeneous and often consider data locality to be the only scheduling constraint. However, this is increasingly not the case in modern data centers. The addition of low-power computing devices and regular hardware upgrades have made heterogeneity the norm, in that clusters are now comprised of several logical sub-clusters each with its own performance and energy profile. In this paper we present oSched, a workflow scheduler that profiles the performance and the energy characteristics of applications on each hardware sub-cluster in a heterogeneous cluster in order to improve the applicationresource match while ensuring energy efficiency and performance related Service Level Agreement (SLA) goals. oSched borrows from our earlier work, fSched, a hardware-aware scheduler, that improves the resource-application match to improve application performance. We evaluate oSched on three clusters with different hardware configurations and energy profiles, where each subcluster comprises of five homogeneous nodes. Our evaluation of oSched shows that application performance and power characteristics vary significantly across different hardware configurations. We show that the hardware-aware scheduling can perform 12.8% faster, while saving 21% more power than hardware oblivious scheduling for the studied applications.