Hypothesis selection and testing by the MDL principle
Abstract
The central idea of the MDL (Minimum Description Length) principle is to represent a class of models (hypotheses) by a universal model capable of imitating the behavior of any model in the class. The principle calls for a model class whose representative assigns the largest probability or density to the observed data. Two examples of universal models for parametric classes M are the normalized maximum likelihood (NML) model f(xn | M) = f(xn | e(xn)) f /(yn | (yn))dyn, where is an appropriately selected set, and a mixture fw(x\M) = I f(xe)w(6)d9 as a convex linear functional of the models. In this interpretation a Bayes factor fω(xn \f(xn|θ) θ)ω(θ)dθ of mixture representatives of two model classes. However, mixtures need not be the best representatives, and as will be shown the NML model provides a strictly better test for the mean being zero in the Gaussian cases where the variance is known or taken as a parameter.