Active Learning of Local Causal Pathways from High-Dimensional Data: New Methods and Empirical Comparison: Alexander Statnikov
Abstract: Discovery of causal networks from data is a fundamental problem of several computational disciplines. Several sound algorithms have been proposed that can use high-dimensional observational data to infer causal relations. However, observational data is in general insufficient to unravel all causal relations within observed variables, because many causal relations cannot be statistically distinguished with observational data alone. Therefore, it is essential to refine discoveries from observational data with limited and targeted experimental data. This led to recent development of several methods for active learning of causal networks which utilize observational and experimental data in order to discover causal networks. We propose new accurate and experimentally efficient methods for discovery of local causal pathways which contain only direct causes and direct effects of the response/target variables of interest. We conduct a comprehensive evaluation of new and existing methods with data of dimensionality up to 1,000,000 variables. We use both artificially simulated networks and in-silico gene transcriptional networks that model the characteristics of real gene expression data.