Managing data center tickets: Prediction and active sizing
Abstract
Performance ticket handling is an expensive operationin highly virtualized cloud data centers where physical boxeshost multiple virtual machines (VMs). A large body of ticketsarise from the resource usage warnings, e.g., CPU and RAMusages that exceed predefined thresholds. The transient natureof CPU and RAM usage as well as their strong correlation acrosstime among co-located VMs drastically increase the complexityin ticket management. Based on a large resource usage datacollected from production data centers, amount to 6K physicalmachines and more than 80K VMs, we first discover patternsof spatial dependency among co-located virtual resources. Leveragingour key findings, we develop an Active Ticket Managing(ATM) system that consists of (i) a novel time series predictionmethodology and (ii) a proactive VM resizing policy for CPUand RAM resources for co-located VMs on a physical box thataims to drastically reduce usage tickets. ATM exploits the spatialdependency across multiple resources of co-located VMs forusage prediction and proactive VM resizing. Evaluation resultson traces of 6K physical boxes and a prototype of a MediaWikisystem show that ATM is able to achieve excellent predictionaccuracy of a large number of VM time series and significantusage ticket reduction, i.e., up to 60%, at low computational overhead.