A Host-Agnostic, Supervised Machine Learning Approach to Automated Overload Detection in Virtual Machine Workloads
Abstract
This paper evaluates a mechanism for applying machine learning (ML) to identify over-constrained IaaS virtual machines (VMs). Herein, over-constrained VMs are defined as those who are not given sufficient system resources to meet their workload specific objective functions. To validate our approach, a variety of workload-specific benchmarks inspired by common Infrastructure-as-a-Service (IaaS) cloud workloads were used. Workloads were run while regularly sampling VM resource consumption features exposed by the hypervisor. Datasets were curated into nominal or over-constrained and used to train ML classifiers to determine VM over-constraint rules based on one-time workload analysis. Rules learned on one host are transferred with the VM to other host environments to determine portability. Key contributions of this work include: Demonstrating which VM resource consumption metrics (features) prove most relevant to learned decision trees in this context, and a technique required to generalize this approach across hosts while limiting required up front training expenditure to a single VM and host. Other contributions include a rigorous explanation of the differences in learned rulesets as a function of feature sampling rates, and an analysis of the differences in learned rulesets as a function of workload variation. Feature correlation matrices and their corresponding generated rule sets demonstrate individual features comprising rule sets tend to show low cross-correlation (below 0.4) while no individual feature shows high direct correlation with classification. Our system achieves workload-specific error percentages below 2.4% with a mean error across workloads of 1.43% (and strong false negative bias) for a variety of synthetic, representative, cloud workloads tested.