Low-storage online estimators for quantiles and densities
Abstract
The traditional estimator ξp, n for the p-quantile ξp of a random variable X, given n observations from the distribution of X, is obtained by inverting the empirical cumulative distribution function (cdf) constructed from the obtained observations. The estimator ξp, n requires O(n) storage, and it is well known that the mean squared error of ξp, n (with respect to p) decays as O(n-1). In this article, we present an alternative to ξp, n that seems to require dramatically less storage with negligible loss in convergence rate. The proposed estimator, ξ p, n, relies on an alternative cdf that is constructed by accumulating the observed random variâtes into variable-sized bins that progressively become finer around the quantile. The size of the bins are strategically adjusted to ensure that the increased bias due to binning does not adversely affect the resulting convergence rate. We present an 'online' version of the estimator ξp, n, along with a discussion of results on its consistency, convergence rates, and storage requirements. We also discuss analogous ideas for density estimation. We limit ourselves to heuristic arguments in support of the theoretical assertions we make, reserving more detailed proofs to a forthcoming paper. © 2013 IEEE.