AUTOLOOP: Automated action selection in the "Observe-Analyze-Act" loop for storage systems
Abstract
Enterprise applications typically depend on guaranteed performance from the storage subsystem, lest they fail. However, changes in the workload characteristics, component failures, load surges, are unlikely to result in guaranteed performance for the applications. Given that widespread access protocols and scheduling policies are largely best-effort, the problem of meeting performance goals on a shared system is a very difficult one, and currently accomplished by human administrators, using a 24 × 7 Observe-Analyze-Act (OAA) loop. AUTOLOOP is an OAA automation framework that uses a combination of self-refining models and constrained optimization techniques. This paper gives an overview of the automation process, and focuses on the analyze aspect of the loop that selects the corrective action. The process of action selection today is "black magic" - human administrators use their years of experience and coarse-grained heuristics to select along a spectrum of actions ranging from short-term tuning (such as throttling of workloads) to long-term modifications (such as migration of data among the available resources). AUTOLOOP is the first-of-a-kind within storage systems that formalizes the task of action selection as a machine-executable constraint solving problem. AUTOLOOP exhaustively searches the solution-space of corrective actions, uses skyline analysis to short-list a subset of low-cost high-benefit actions, and selects the optimal set of actions along with a schedule to invoke them. The action selection takes into account the cost of action invocation, the expected benefit, the current and future workload needs, the overall load pattern on the system, and the application-level Service Level Objectives (SLOs). © 2005 IEEE.