Account clustering in multi-tenant storage management environments
Abstract
Multi-tenant storage management environments typically manage multiple enterprise accounts with heterogeneous storage footprints. Identifying and grouping accounts with similar storage footprints into clusters reduces account management overhead, and provides a framework for data-driven storage recommendation services. This paper describes a method for the clustering of accounts in multi-tenant storage management environments. Storage system vendors, models, and footprints are captured as a set of properties, and a pairwise distance function is employed to grade similarity between accounts along multiple dimensions. A graph representation is created based on account similarity values. Finally, the clustering algorithm is defined as a graph algorithm with the purpose of repeatedly finding maximum cliques, and removing them from the graph. The result of the graph algorithm is a set of clusters, each grouping together accounts with very similar storage footprints. Clusters are then rated along multiple metrics, are compared to their peers along multiple performance dimensions, and receive recommendations on how to further improve storage efficiency.