Detection of Unobserved Common Cause in Discrete Data Based on the MDL Principle
Abstract
Inference of causal structure in random variables from observed data only is a crucial science problem. This study categorizes the causal relationship between two discrete random variables into four categories, which include the case that variables have a common cause and that variables are independent in addition to two directions of the direct causality. Although several existing methods have been proposed for causal inference from a joint distribution of two discrete random variables that can select either direction of the direct causality, methods that can directly infer a causal relationship from the four categories mentioned are limited. We proposed the first method to infer a causal relationship without any assumption on the unobserved confounder in discrete data. Based on the minimum description length (MDL) principle, the proposed method calculates the code length of the observed data for each causal model and then selects a model which yields the minimum code length. We demonstrated that the proposed method is effective in detecting common causes and outperforms the existing methods in terms of the accuracy of inference using synthetic data. We further showed that the proposed method effectively detects an unobserved confounder in real-world data.