The data stream space complexity of cascaded norms
Abstract
We consider the problem of estimating cascaded aggregates over a matrix presented as a sequence of updates in a data stream. A cascaded aggregate P o Q is defined by evaluating aggregate Q repeatedly over each row of the matrix, and then evaluating aggregate P over the resulting vector of values. This problem was introduced by Cormode and Muthukrishnan, PODS, 2005 [CM]. We analyze the space complexity of estimating cascaded norms on an n × d matrix to within a small relative error. Let Lp denote the p-th norm, where p is a non-negative integer. We abbreviate the cascaded norm Lk o L p by Lk,p. (1) For any constant k ≥ p ≥ 2, we obtain a 1-pass Õ (n1-2/kd1-2/p)-space algorithm for estimating Lk,p. This is optimal up to polylogarithmic factors and resolves an open question of [CM] regarding the space complexity of L 4,2. We also obtain 1-pass space-optimal algorithms for estimating L∞ and L k,∞ (2) We prove a space lower bound of Ω(n1-1/k) on estimating Lk,0 and Lk,1, resolving an open question due to Indyk, IITK Data Streams Workshop (Problem 8), 2006. We also resolve two more questions of [CM] concerning Lk2 estimation and block heavy hitter problems. Ganguly, Bansal and Dube (FAW, 2008) claimed an Õ(1)-space algorithm for estimating Lk,p for any k,p ∈ [0,2]. Our lower bounds show this claim is incorrect. © 2009 IEEE.