Computational bounds on statistical query learning
Abstract
We study the complexity of learning in Kearns' well-known statistical query (SQ) learning model (Kearns, 1993). A number of previous works have addressed the definition and estimation of the information-theoretic bounds on the SQ learning complexity, in other words, bounds on the query complexity. Here we give the first strictly computational upper and lower bounds on the complexity of several types of learning in the SQ model. As it was already observed, the known characterization of distribution-specific SQ learning (Blum, et al. 1994) implies that for weak learning over a fixed distribution, the query complexity and computational complexity are essentially the same. In contrast, we show that for both distribution-specific and distribution-independent (strong) learning there exists a concept class of polynomial query complexity that is not efficiently learnable unless RP = NP. We then prove that our distribution-specific lower bound is essentially tight by showing that for every concept class C of polynomial query complexity there exists a polynomial time algorithm that given access to random points from any distribution D and an NP oracle, can SQ learn C over D. We also consider a restriction of the SQ model, the correlational statistical query (CSQ) model (Bshouty and Feldman, 2001; Feldman, 2008) of learning which is closely-related to Valiant's model of evolvability (Valiant, 2007). We show a similar separation result for distribution-independent CSQ learning under a stronger assumption: There exists a concept class of polynomial CSQ query complexity which is not efficiently learnable unless every problem in W[P] has a randomized fixed parameter tractable algorithm. © 2012 V. Feldman & V. Kanade.