Publication
CoNEXT 2016
Conference paper

GRETEL: Lightweight fault localization for OpenStack

View publication

Abstract

Like any other distributed system, cloud management stacks such as OpenStack, are susceptible to faults whose root cause is often hard to diagnose and may take hours or days to fix. We present GRETEL, a system that leverages nonintrusive system monitoring, to expedite root cause analysis of both operational and performance faults manifesting in OpenStack operations. GRETEL uses unique operational fingerprints to quickly identify faulty operations at runtime. GRETEL is accurate in its diagnosis, and achieves >98% precision in identifying the faulty operation with very few false positives even under conditions of stress. GRETEL is lightweight and orders of magnitude faster than prior work, sustaining a throughput of -77 Mbps.

Date

Publication

CoNEXT 2016

Authors

Share