A Statistical Interpretation of the Maximum Subarray Problem
Abstract
Maximum subarray is a classical problem in computer science that given an array of numbers aims to find a contiguous subarray with the largest sum. We focus on its use for a noisy statistical prob- lem of localizing an interval with a mean different from background. While a naive application of maximum subarray fails at this task, both a penalized and a constrained version can succeed. We show that the penalized version can be derived for common exponential family distributions, in a manner similar to the change-point detec- tion literature, and we interpret the resulting optimal penalty value. The failure of the naive formulation is then explained by an analysis of the estimated interval boundaries. Experiments further quantify the effect of deviating from the optimal penalty. We also relate the penalized and constrained formulations and show that the solutions to the former lie on the convex hull of the solutions to the latter.