CMU-CLeaR Group | Benchmarks

We think it will further benefit the community to maintain a list of (real or pseudo-real) benchmark datasets. Below is a list of benchmarks for which we either have reported “true” model or believe certain causal relations should exist or shouldn’t exist in light of background knowledge or common sense. Please contact us and suggest more if you know any. We will update the reference to the ground-truth and usage instructions soon.

Infant Health and Development Program (IHDP) data: interventional data. Data available in the supplementary material.
Churn for bank customers dataset
Teleco customer churn
Abalone data
Boston housing data
Sachs data
Galton’s family data on human stature: click here for a preprocessed format.
Dropouts data: correlation matrix in Fig.3.
Big Five Personality data: with latent personality variables.
Teacher Burnout data: check Fig.6.2 in BM Byrne 2013 for the hypothesized causal graph. Also check our papers GIN and TIN for results of learning latent variables on the dataset.
Example-Causal-Datasets: a repository maintained by Joseph Ramsey.