Carnegie Mellon University
Theory of Causation

Theory of Causation

Causation and inductive inference have been linked in the philosophical literature since David Hume. The Department’s contribution to the foundations of causation and causal discovery over the past two decades has transformed the subject and is having influence not only within philosophy, computer science, and statistics, but also in the social sciences, biology, and even planetary science. The basic idea is that, although correlation or statistical dependence cannot determine the causal relationship between two variables, it can, under plausible assumptions, determine some causal relationships when three or more variables are considered. That allows for algorithms that can sometimes recover features of the causal structure of an unknown system from patterns of correlation alone. The following is a brief list of our work in this area since 2008.

Confidence bounds for causal inference. Spirtes, Glymour and Scheines showed that causal discovery converges to the truth under plausible conditions (i.e., is point-wise consistent). That does not entailthat one can say a priori how probable it is that one is close to the truth at a given sample size. Spirtes and former student J. Zhang have investigated strengthened versions of the Faithfulness assumption and revised algorithmsunder which uniform consistency obtains.

Fundamental assumptions. Several fundamental assumptions link probability distributions to causal relations and serve as the basis of the theory of causal inference. The Causal Markov assumption states that each variable isindependent of its non-effects conditional on its direct causes. The Causal Faithfulness assumption states that the only conditional independencies that hold in a population are those entailed by the causal Markov assumption. The validity of these assumptions has come under intense philosophical scrutiny. Spirtes has explored to what extent the Causal MarkovAssumption is invariant under transformation of variables. Spirtes and former student Zhang have explored several variants of the Causal Faithfulness assumption that both weaken it in some respects and strengthen it in other respects. This work has yielded practical implications for improving causal inference algorithms. Spirtes and Zhang have also investigated what inferences can be made when the Causal Faithfulness Assumption is replaced with the much weaker Causal Minimality Assumption (i.e. the population distribution does not fit a subgraph of the true causal graph.)

Other parametric families, semi-parametric and non-parametric inference. There are a wide variety of causal inference algorithms that assume that the population distribution was either Gaussian or multinomial. Spirtes and former student Tillman devised a non-parametric test of conditional independence that could be used to extend constraint-based causal inference algorithms to arbitrary distributions. Ramsey has done research into more rapid non-parametric testing of conditional independence. Spirtes and Tillman have also considered non-Gaussian linear distributions and additive noise distributions, and shown that these families of distributions allow algorithms to reliably draw a richer set of causal conclusions than the Gaussian or multinomial case allow.

Inference of causal theories with unmeasured variables. Unmeasured (latent) common causes are one of the main obstacles to reliable causal inference. Spirtes, in collaboration with Claassen is exploring significantly speeding up the existing FCI algorithm. Spirtes, in collaboration with G. Cooper is developing combined constraint-based/Bayesian algorithms that searches for causal models that is correct even with unmeasured variables. Glymour has worked with an undergraduate, Alexander Murray-Watters, now a Master’s student, and Richard Scheines on developing procedures for identifying unobserved causes that operate on causal pathways between observed variables. This work is about to be applied to empirical data on cell signaling mechanisms.

Ockham’s Razor in Causal Discovery. In order to illustrate the relevance of retraction minimization to practical inquiry, Kelly and former Ph.D. student C. Mayo-Wilson have shown that any particular causal arrow inferred by a point-wise consistent causal method can be forced by nature to flip in orientation any number of times prior to convergence to the true orientation. Still, the basic strategy of basing causal conclusions on the outcomes of statistical tests is optimally retraction-efficient. Kelly and Mayo-Wilson performed extensive simulation studies with our colleagues’ causal search software to illustrate the mathematical results. They repeatedly succeeded to were able to produce two successive orientation flips of a given causal connection, in accordance with the underlying theory.

Pooling distinct data sets. Spirtes and Tillman have explored the question of overlapping sets of variables. That is, if one hasmultiple datasets that measure overlapping sets of variables, what (if anything) can be learned about the causal structure underlying all of the measured variables? The standard answers in statistics all require strong assumptions. Spirtes and Tillman have improved algorithms devised by Glymour, Danks and Tillman.

Causal factor analysis. da Silva, Scheines, Glymour, and Spirtes, previously developed an algorithm, BuildPureClusters, that can provably reliably learn a “measurement model,” i.e., the set of latent variables that underlie a given set of measured variables as long as each cluster is caused by at most one latent variable. Scheines, Spirtes, Ramsey, and Ph.D. students Kummerfeld and Yang have extended the Build Pure Clusters Algorithm so that it can reliably find the set of latent variables that unlie a given set of measured variables even when a cluster of variables is caused by multiple latents .

Time Series. Danks, in collaboration with Sergey Plis (Mind Research Network, University of New Mexico) has focused on learning from time series data that is undersampled relative to the true causal timescale. For example, communication between neurons happens relatively rapidly (on the order of 100 ms), but fMRI measurements are typically much slower (on the order of two seconds). Danks & Plis first showed that this undersampling --- measuring the system more slowly than the underlying processes --- can significantly impair causal structure learning, including causal connections being missed, spurious connections being added, and the proper causal direction being reversed. They then proved a set of theorems about how causal systems (appear to) change under different types of undersampling. They are currently using those theorems to develop causal structure learning algorithms that extract the structure at the causal timescale (or at least, as much as possible) from the undersampled time series data.

Efficient experimental design. Scheines and former PhD student Frederick Ebherhardt, together with Patrik Hoyer have examined the problem of causal inference from sequences of experiments. They considered learning linear cyclic causal models, and established worst-case bounds on successful causal inference that make causal discovery on large scale systems such as genetic regulatory networks experimentally possible. Eberhardt and Hoyer have extended this work to include latent variables.

Changing Causal Structure. Danks, in collaboration with Erich Kummerfeld (current Ph.D. student) has focused on situations in which the causal structure can potentially change without warning or signal. For example, the brake line in one's car might break, so that the Pedal --> Brakes causal connection is suddenly absent. Alternately, the changes can be slower, as when one's laptop battery gradually loses the ability to hold a charge. Kummerfeld & Danks developed a novel algorithm (LoSST) for causal structure learning in these types of situations. The LoSST algorithm performs similarly to standard causal structure learning algorithms when the causal structure is actually stable. But if the causal structure does change, the LoSST algorithm quickly identifies that a change has occurred (using only the observed data) and rapidly learns the new causal structure. In the course of this research, Kummerfeld & Danks also identified a novel methodological challenge. One natural (though previously undiscussed) methodological virtue is vigilance: if the world changes, then the method has a non-zero probability of recognizing that change within a known time period. A well-known methodological virtue is consistency: if the world does not change, then the method converges to it in the infinite limit. Kummerfeld & Danks showed that no statistical estimator can be both consistent and diligent; learning methods can provably learn the structure of a stable world or provably protect themselves against a changing world, but not both.

Applications. Several faculty members (Glymour, Ramsey, Spirtes, Danks) have been engaged for thepast several years in investigations into the causal analysis of functional magnetic resonance imaging (fMRI), along with collaborators Russell Poldrack at the University of Texas at Austin, Steve Hanson at Rutgers University, and Sergey Plis at the University of New Mexico. The goal of the research is to determine networks of causal relations in the brain from fMRI data. To thisend, Glymour, Ramsey, and Spirtes have scaled up, modified, or developed a number of algorithms. To search for undirected causal connections, Glymour and Ramsey have devised a multi-subject version of a Bayesian search algorithm originally developed by a CMU Philosophy graduate student, Chris Meek, and showed its accuracy for inferring brain mechanisms from brain scans, producing the algorithm IMaGES, to allow for search from data for multiple subjects simultaneously. The procedure has found several empirical applications and a textbook endorsement, Ramsey, Spirtes, and Glymour have also modified the PC algorithm for the same purpose. They have also developed a number of algorithms to use non-Gaussianity to orient causal orientation for fMRI data, given causal connections from another algorithm. They are presently refining and extending these algorithms and working to apply Bayes net methods to further theoretical problems arising from the analysis of fMRI data. Results by Danks and Plis suggest that there are limits to be placed on the learning of graphs from undersampled data, a theoretical constraint that needs to be born in mind for fMRI, since the rate of measurement of samples (1 - 3 seconds) is far slower than the rate at which causal processes happen (100 ms). In collaboration with behavioral economists Cynthia Cryder and Geoge Loewenstein, Scheines has applied causal discovery techniques to models of charitable giving, finding that the perceived impact of one’s donation screens off the sympathy felt for those in need. Scheines and educational researcher Martina Rau have applied causal discovery methods to log files of student behavior on a fractions tutor, finding evidence to support the ideathat teaching elementary students to conceptually understand fractions prior to becoming fluent in fraction arithmetic is more effective at producing learning than vice versa.

Scheines, Glymour, and Danks organized an important workshop on scientific applications of the causal discovery techniques that SGS pioneered in the 1990s. Three economists, four brain researchers, a biologist, a climate scientist, four genetics researchers, and a few educational researchers all presented compelling examples in which real scientific progress was clearly achieved largely due to the use of causal search technology. This community will hopefully grow, and will continue to push on theoreticians to provide machine learning technology that is of real use to practicing scientists.

Evidential Coherence and Causation. In collaboration with Greg Wheeler, Scheines investigated the ways in which the causal relations among a hypothesis and evidence that might bear on it mediates the relationship between "evidential coherence" and "confirmation" - both of which can be modeled using probabilistic relations. Under the assumption that these relations are reasonable models of these philosophical notions (which is not necessarily our view), it turns out that for certain classes of models the causal structure tells the entire story, and for others the causal story along with bounds allow one to say quite a lot about how "coherence" and "confirmation" relate. This work was recently published in Mind.

Mis-specification. In collaboration with psychometricians at UCLA, Scheines researched the interaction between measurement model mis-specification and bias in estimating causal relationships between latent variables. Surprisingly, using measurement models mis-specified as unidimensional often involves very low bias when estimating causal parameters. Under reasonable assumptions, statistics can be derived to indicate when a researcher is likely to be in such a situation. This research was also extended to investigate whether multi-dimensional measurement models could be built reliably using existing clustering algorithms - the answer being no if the multi-dimensionality is even moderately complex.

Variable Construction. Scheines, Danks and PhD student Steve Fancsali investigated the relationship between the goals of scientific investigation(predictive or causal) and the sorts of variables one should construct or define from raw data logs that are not yet even random variables. If one's goal is prediction, the task is simple, but if one's goal is causal, there is a decision theoretic tension between finding variables that are a) strong predictors of a target but not knowably causes, or b) weaker predictors, but more knowably causes.

Testing. Spirtes has served on the board of Chalearn an organization devoted to setting up open competitions for analyzing data in order to make causal inferences. Several such competition have been successfully run in the last year.

Policy. Scheines has served on several National Academy of Science committees that apply causal inference methodology to problems of national concern. In 2005, the Institute of Medicine systematically studied the effect of food marketing on the diets of children. Scheines constructed coding standards for causal inference and, along with committee members applied it to over a hundred published studies. In 2007, the IOM’s Committee on the Evaluation of the VA’s Presumptive Disability Decision-Making Process was asked to construct a modern framework for reviewing the scientific evidence for whether wartime exposure caused veteran disease. In 2012-14, the National Research Council was asked to review the EPA’s IRIS Process – a system that the EPA uses to identify hazard and assess risk in non-pesticide chemicals. In both of these committees, Scheines led the effort to construct a framework for using diverse bases ofevidence (experimental, observational, animal studies) for and against causation.