Block variable selection in multivariate regression and high-dimensional causal inference
Abstract
We consider multivariate regression problems involving high-dimensional predictor and response spaces. To efficiently address such problems, we propose a variable selection method, Multivariate Group Orthogonal Matching Pursuit, which extends the standard Orthogonal Matching Pursuit technique. This extension accounts for arbitrary sparsity patterns induced by domain-specific groupings over both input and output variables, while also taking advantage of the correlation that may exist between themultiple outputs. Within this framework,we then formulate the problemof inferring causal relationships over a collection of high-dimensional time series variables. When applied to time-evolving social media content, our models yield a new family of causality-based influence measures that may be seen as an alternative to the classic PageRank algorithm traditionally applied to hyperlink graphs. Theoretical guarantees, extensive simulations and empirical studies confirm the generality and value of our framework.