Fast approximation of matrix coherence and statistical leverage
Abstract
The statistical leverage scores of a data matrix are the squared row-norms of any matrix whose columns are obtained by orthogonalizing the columns of the data matrix; and, the coherence is the largest leverage score. These quantities play an important role in several machine learning algorithms because they capture the key structural nonuniformity of the data matrix that must be dealt with in developing efficient randomized algorithms. Our main result is a randomized algorithm that takes as input an arbitrary nxd matrix A, with n ≫ d, and returns, as output, relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs in O(nd log n) time, as opposed to the O(nd 2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. This resolves an open question from (Drineas et al., 2006) and (Mohri & Talwalkar, 2011); and our result leads to immediate improvements in coreset-based ℓ 2-regression, the estimation of the coherence of a matrix, and several related low-rank matrix problems. Interestingly, to achieve our result we judiciously apply random projections on both sides of A. Copyright 2012 by the author(s)/owner(s).